That's a problem that any general purpose design has. It's something Dojo would have fixed, but it went too far in the other direction and only supported training. Rumor has it the new version will support inference too.
Part of this is a human problem. The company wants better utilisation, so hires resourcing experts tasked to allocate resources between projects and teams.
These experts set up quota systems, priority allocation, month-ahead plans, burst and idle quotas, etc, all with a goal to get the resource better used.
However it ends up having the reverse effect - teams now waste the resource deliberately to make it appear they have better utilisation, and run pointless jobs because "use it or lose it" quota systems discourage being thrifty.
These problems are compounded by there being hundreds of resource types - "I've got plenty of CPU and GPU TFlops for my project, but I've run out of disk spindle hours so can't run the training job".
End result is that the company as a whole doesn't even know real utilisation, and makes exceptionally poor use of resources.
Article says,this is a software issue. Where GPU'S are unable to get to be fully utilized due to scaling issues. I dont know how hardware that scale works, but it could very well be that they still need all of their hardware to get their current compute
Grok is pretty bad. No wonder usage is low. I think they messed up when they removed the human annotation team and went in the direction of automation.
The bet can eventually pay off when they figure out how to train without human help and also generate useful models. Imagine is terrible too.
More competition is great for us users. I hope they recover. In the meantime why not hosting oss models like google does?
My understanding is that inference (running existing models) is around 1/4th of the average compute budget for AI companies. Training new models takes up about 3/4ths.
As such, using only 11% of their GPUs indicates that they've elected not to do as much training as they are capable of.
Why is everyone stating conjecture based answers like my 10 year old kid with absolutely no evidence to back it?
The most childish reply to this thread is yours, max.
Aren't Xai's datacenters powered by [currently very expensive] diesel?
That's a problem that any general purpose design has. It's something Dojo would have fixed, but it went too far in the other direction and only supported training. Rumor has it the new version will support inference too.
Part of this is a human problem. The company wants better utilisation, so hires resourcing experts tasked to allocate resources between projects and teams.
These experts set up quota systems, priority allocation, month-ahead plans, burst and idle quotas, etc, all with a goal to get the resource better used.
However it ends up having the reverse effect - teams now waste the resource deliberately to make it appear they have better utilisation, and run pointless jobs because "use it or lose it" quota systems discourage being thrifty.
These problems are compounded by there being hundreds of resource types - "I've got plenty of CPU and GPU TFlops for my project, but I've run out of disk spindle hours so can't run the training job".
End result is that the company as a whole doesn't even know real utilisation, and makes exceptionally poor use of resources.
This is the exect information I am looking for
Soon the market will flood with liquidations of everything from these
Article says,this is a software issue. Where GPU'S are unable to get to be fully utilized due to scaling issues. I dont know how hardware that scale works, but it could very well be that they still need all of their hardware to get their current compute
If they had the demand, this problem would be fixed. Even giving free credits xai would not get the users, nobody wants to use Elon's LLM.
That's why he bought Cursor, trying to get the customers to have an audience to give free credits.
Where does one go (virtually or physically) to participate as a buyer in these markets?
Grok is pretty bad. No wonder usage is low. I think they messed up when they removed the human annotation team and went in the direction of automation.
The bet can eventually pay off when they figure out how to train without human help and also generate useful models. Imagine is terrible too.
More competition is great for us users. I hope they recover. In the meantime why not hosting oss models like google does?
My understanding is that inference (running existing models) is around 1/4th of the average compute budget for AI companies. Training new models takes up about 3/4ths.
As such, using only 11% of their GPUs indicates that they've elected not to do as much training as they are capable of.