I've done the modeling on this a few times and I always get to a place where inference can run at 50%+ gross margins, depending mostly on GPU depreciation and how good the host is at optimizing utilization. The challenge for the margins is whether or not you consider model training costs as part of the calculation. If model training isn't capitalized + amortized, margins are great. If they are amortized and need to be considered... yikes
Why wouldn't you factor in training? It is not like you can train once and then have the model run for years. You need to constantly improve to keep up with the competition. The lifespan of a model is just a few months at this point.
I suspect we've already reached the point with models at the GPT5 tier where the average person will no longer recognize improvements and this model can be slightly improved at slow intervals and indeed run for years. Meanwhile research grade models will still need to be trained at massive cost to improve performance on relatively short time scales.
Whenever someone has complained to me about issues they are having with ChatGPT on a particular question or type of question, the first thing I do is ask them what model they are using. So far, no one has ever known offhand what model they were using, nor were not aware there are more models!
If you understand there are multiple models from multiple providers, some of those models are better at certain things than others, and how you can get those models to complete your tasks, you are in the top 1% (probably less) of LLM users.
As long as models continue on their current rapid improvement trajectory, retraining from scratch will be necessary to keep up with the competition. As you said, that's such a huge amount of continual CapEx that it's somewhat meaningless to consider AI companies' financial viability strictly in terms of inference costs, especially because more capable models will likely be much more expensive to train.
But at some point, model improvement will saturate (perhaps it already has). At that point, model architecture could be frozen, and the only purpose of additional training would be to bake new knowledge into existing models. It's unclear if this would require retraining the model from scratch, or simply fine-tuning existing pre-trained weights on a new training corpus. If the former, AI companies are dead in the water, barring a breakthrough in dramatically reducing training costs. If the latter, assuming the cost of fine-tuning is a fraction of the cost of training from scratch, the low cost of inference does indeed make a bullish case for these companies.
In the same way that every other startup tries to sweep R&D costs under the rug and say “yeah but the marginal unit economics have 50% gross margins, we’ll be a great business soon”.
I agree that you could get to high margins, but I think the modeling holds only if you're an AI lab operating at scale with a setup tuned for your model(s). I think the most open study on this one is from the DeepSeek team: https://github.com/deepseek-ai/open-infra-index/blob/main/20...
For others, I think the picture is different. When we ran benchmarks on DeepSeek-R1 on 8x H200 SXM using vLLM, we got up to 12K total tok/s (concurrency 200, input:output ratio of 6:1). If you're spiking up 100-200K tok/s, you need a lot of GPUs for that. Then, the GPUs sit idle most of the time.
I'll read the blog post in more detail, but I don't think the following assumptions hold outside of AI labs.
* 100% utilization (no spikes, balanced usage between day/night or weekdays)
* Input processing is free (~$0.001 per million tokens)
* DeepSeek fits into H100 cards in a way that network isn't the bottleneck
I wonder how much capex risk there is in this model, depreciating the GPUs over 5 years is fine if you can guarantee utilization. Losing market share might be a death sentence for some of these firms as utilization falls.
> whether or not you consider model training costs as part of the calculation
Whether they flow through COGS/COR or elsewhere on the income statement, they've gotta be recognized. In which case, either you have low gross margins or low operating profit (low net income??). Right?
That said, I just can't conceive of a way that training costs are not hitting gross margins. Be it IFRS/GAAP etc., training is 1) directly attributable to the production of the service sold, 2) is not SG&A, financing, or abnormal cost, and thus 3) only makes sense to match to revenue.
It's already questionable if anyone can make it profitable once you account for all the costs. Why do you think they try to squash the legal concerns so hard? If they move fast and stick their fingers in their ears, they can just steal whatever the want.
I have to disagree. The biggest cost is still energy consumption, water and maintenance. Not to mention, to keep up with the rivals in incredibly high tempo (so offering billions like Meta recently). Then the cost of hardware that is equal to Nvidia skyrocketing shares :)
No one should dare to talk about profit yet. Now is time to grab the market, invest a lot and work hard, hopping for a future profit. The equation is still work on progress.
The global cost of inference in both Openai and Anthropic it exceed training cost for sure.
The reason is simple: the inference cost grows with requests not with datasets. My math simplified by AI says: Suppose training GPT-like model costs
=
$
10,000,000
C
T
=$10,000,000.
Each query costs
=
$
0.002
C
I
=$0.002.
Break-even:
>
10,000,000
0.002
=
5,000,000,000
inferences
N>
0.002
10,000,000
=5,000,000,000inferences
So after 5 billion queries, inference costs surpass the training cost.
Openai claims it has 100 million users x queries = I let you judge.
No. But training an LLM is certainly very very expensive and a gamble every time you do it. I think of it a bit like a pharmaceutical company doing vaccine research…
> Most of what we're building out at this point is the inference [...] We're profitable on inference. If we didn't pay for training, we'd be a very profitable company.
ICYMI, Amodei said the same in much greater detail:
"If you consider each model to be a company, the model that was trained in 2023 was profitable. You paid $100 million, and then it made $200 million of revenue. There's some cost to inference with the model, but let's just assume, in this cartoonish cartoon example, that even if you add those two up, you're kind of in a good state. So, if every model was a company, the model, in this example, is actually profitable.
What's going on is that at the same time as you're reaping the benefits from one company, you're founding another company that's much more expensive and requires much more upfront R&D investment. And so the way that it's going to shake out is this will keep going up until the numbers go very large and the models can't get larger, and then it'll be a large, very profitable business, or, at some point, the models will stop getting better, right? The march to AGI will be halted for some reason, and then perhaps it'll be some overhang. So, there'll be a one-time, 'Oh man, we spent a lot of money and we didn't get anything for it.' And then the business returns to whatever scale it was at."
The "model as company" metaphor makes no sense. It should actually be models are products, like a shoe. Nike spends money developing a shoe, then building it, then they sell it, and ideally those R&D costs are made up in shoe sales. But you still have to run the whole company outside of that.
Also, in Nike's case, as they grow they get better at making more shoes for cheaper. LLM model providers tell us that every new model (shoe) costs multiples more than the last one to develop. If they make 2x revenue on training, like he's said, to be profitable they have to either double prices or double users every year, or stop making new models.
But new models to date have cost more than the previous ones to create, often by an order of magnitude, so the shoe metaphor falls apart.
A better metaphor would be oil and gas production, where existing oil and gas fields are either already finished (i.e. model is no longer SOTA -- no longer making a return on investment) or currently producing (SOTA inference -- making a return on investment). The key similarity with AI is new oil and gas fields are increasingly expensive to bring online because they are harder to make economical than the first ones we stumbled across bubbling up in the desert, and that's even with technological innovation. That is to say, the low hanging fruit is long gone.
> new models to date have cost more than the previous ones to create
This largely was the case in software in the '80s-'10s (when versions largely disappeared) and still is the case in hardware. iPhone 17 will certainly cost far more to develop than did iPhone 10 or 5. iPhone 5 cost far more than 3G, etc.
If you're going to use shoes as the metaphor, a model would be more like a shoe factory. A shoe would be a LLM answer, i.e. inference. In which case it totally makes sense to consider each factory as an autonomous economic unit, like a company.
>Also, in Nike's case, as they grow they get better at making more shoes for cheaper.
This is clearly the case for models as well. Training and serving inference for GPT4 level models is probably > 100x cheaper than they used to be. Nike has been making Jordan 1's for 40+ years! OpenAI would be incredibly profitable if they could live off the profit from improved inference efficiency on a GPT4 level model!
It's model as a company because people are using the VC mentality, and also explaining competition.
Model as a product is the reality, but each model competes with previous models and is only successful if it's both more cost effective, and also more effective in general at its tasks. By the time you get to model Z, you'll never use model A for any task as the model lineage cannibalizes sales of itself.
Analogies don't prove anything, but they're still useful for suggesting possibilities for thinking about a problem.
If you don't like "model as company," how about "model as making a movie?" Any given movie could be profitable or not. It's not necessarily the case that movie budgets always get bigger or that an increased budget is what you need to attract an audience.
Okay but noticeably he invents two numbers then pretends that a third number is irrelevant in order to claim that each model (which is not a company) is a profitable company.
You'd think maybe the CEO might be able to give a ball park on the profit made off that 2023 model.
ETA: "You paid $100 million... There's some cost to inference with the model, but let's just assume ... that even if you add those two up, you're kind of in a good state."
You see this right? He literally says that if you assume revenue exceeds costs then it's profitable. He doesn't actually say that it does though.
OpenAI and Anthropic have very different customer bases and usage profiles. I'd estimate a significantly higher percentage of Anthropic's tokens are paid by the customer than OpenAI's. The ChatGPT free tier is magnitudes more popular than Claude's free tier, and Anthropic in all likelihood does a higher percentage of API business versus consumer business than OpenAI does.
In other words, its possible this story is correct and true for Anthropic, but not true for OpenAI.
Good point, very possible that Altman is excluding free tier as a marketing cost even if it loses more than they make on paid customers. On the other hand they may be able to cut free tier costs a lot by having the model router send queries to gpt-5-mini where before they were going to 4o.
Free tier provides a lot of training material. Every time you correct ChatGPT on its mistakes you’re giving them knowledge that’s not in any book or website.
That's interesting, though you have to imagine the data set is very low quality on average and distilling high quality training pairs out of it is very costly.
Also Amodei has an assumption that a 100m model will make 200m of revenue but a 1B model will make 2B of revenue. Does that really hold up? There's no phenomenon that prevents them from only making 200m of revenue off a $1B model.
I take them to be saying the same thing — the difference is that Altman is referring to the training of the next model happening now, while Amodei is referring to the training months ago of the model you're currently earning money back on through inference.
Which is like saying, “If all we did is charge people money and didn’t have any COGS, we’d be a very profitable company.” That’s a truism of every business and therefore basically meaningless.
I can't imagine the hoops an accountant would have to go through to argue training cost is COGS. In the most obvious stick-figures-for-beginners interpretation, as in, "If I had to explain how a P&L statement works to an AI engineer", training is R&D cost and inference cost is COGS.
I wasn’t using COGS in a GAAP sense, but rather as a synonym for unspecified “costs.” My bad. I suppose you would classify training as development and ongoing datacenter and GPU costs as actual GAAP COGS. My point was, if all you focus on is revenue and ignore the costs of creating your business and keeping it running, it’s pretty easy for any business to be “profitable.”
It’s generally useful to consider unit economy separate from whole company. If your unit economy is negative thing are very bleak. If it’s positive, your chance are going up by a lot - scaling the business amortizes fixed (non-unit) costs, such as admin and R&D, and slightly improves unit margins as well.
However this does not work as well if your fixed (non-unit) cost is growing exponentially. You can’t get out of this unless your user base grows exponentially or the customer value (and price) per user grows exponentially.
I think this is what Altman is saying - this is an unusual situation: unit economy is positive but fixed costs are exploding faster than economy if scale can absorb it.
You can say it’s splitting hair, but insightful perspective often requires teasing things apart.
It’s splitting a hair, but a pretty important hair. Does anyone think that models won’t need continuous retraining? Does anyone think models won’t continue to try to scale? Personally, I think we’re reaching diminishing returns with scaling, which is probably good because we’ve basically run out of content to train on, and so perhaps that does stop or at least slow down drastically. But I don’t see a scenario where constant retraining isn’t the norm, even if the rough amount of content we’re using for it grows only slightly.
The Amodei quote in my other reply explains why this is wrong. The point is not to compare the training of the current model to inference on the current model. The thing that makes them lose so much money is that they are training the next model while making back their training cost on the current model. So it's not COGS at all.
Well, only if the one training model continued to function as a going business. Their amortization window for the training cost is 2 months or so. They can't just keep that up and collect $.
They have to build the next model, or else people will go to someone else.
Even being generous, and saying it's a year, most capital expenditures depreciate over a period of 5-7 years. To state the obvious, training one model a year is not a saving grace
I don't understand why the absolute time period matters — all that matters is that you get enough time making money on inference to make up for the cost of training.
I think this is debatable as more models become good enough for more tasks. Maybe a smaller proportion of tasks will require SOTA models. On the other hand, the set of tasks people want to use LLMs for will expand along with the capabilities of SOTA models.
So is OpenAI capable of not making a new model at some point? They've been training the next model continuously as long as they've existed AFAIK.
Our software house spends a lot on R&D sure, but we're still incredibly profitable all the same. If OpenAI is in a position where they effectively have to stop iterating the product to be profitable, I wouldn't call that a very good place to be when you're on the verge of having several hundred billion in debt.
I think at that point there is strong financial pressure to figure out how to continuously evolve models instead of changing new ones, for example by building models out of smaller modules that can be trained individually and swapped out. Jeff Dean and Noam Shazeer talked about that a bit in their interview with Dwarkesh: https://www.dwarkesh.com/p/jeff-dean-and-noam-shazeer
There’s still untapped value in deeper integrations. They might hit a jackpot of exponentially increasing value from network effects caused by tight integration with e.g. disjoint
business processes.
We know that businesses with tight network effects can grow to about 2 trillion in valuation.
Like a very over-served market, I think. I see perhaps three survivors long term, or at most one gorilla, two chimps, and perhaps a few very small niche-focused monkeys.
This can be technically true without being actually true.
IE OpenAI invests in Cursor/Windsurf/Startups that give away credits to users and make heavy use of inference API. Money flows back to OpenAI then OpenAI sends it back to those companies via credits/investment $.
It's even more circular in this case because nvidia is also funding companies that generate significant inference.
It'll be quite difficult to figure out whether it's actually profitable until the new investment dollars start to dry up.
While this could be true, I don't think OpenAI is investing the $hundreds of millions-to-billions that would be required otherwise make it actually true.
OpenAI's fund is ~$250-300mm
Nvidia reportedly invested $1b last year - still way less than Open AI revenue
That is an openai skeptic. His research if correct says not only is openai unprofitable but it likely never will be. Can't be ,its various finance ratios make early uber, amazon ect look downright fiscally frugal.
He is not a tech person for what that means to you.
Amazon was very frugal. If you look at Amazon losses for the first 10 years, they were all basically under 5% of revenue and many years were break even or slightly net positive.
Uber burnt through a lot of money and even now I'm not sure their lifetime revenue is positive (it's possible that since their foundation they've lost more money than they've made).
From the latest NYT Hard Fork podcast [1]. The hosts were invited to a dinner hosted by Sam, where Sam said "we're profitable if we remove training from the equation", they report he turned to Lightcap (COO) and asked "right?" and Lightcap gave an "eeekk we're close".
They aren't yet profitable even just on inference, and its possible Sam didn't know that until very recently.
Except these tech billionaires lie the most of the time. This is still the "grow at any cost" phase, so I don't even genuinely believe he has a confident understanding of how or at what point anything will be profitable. This just strikes me as the best answer he has at the moment.
Will these companies ever stop training new models? What does it mean if we get there. Feels like they will have to constantly train and improve the models, not sure what that means either. What ncremental improvements can these models show?
Another question is - will it ever become less costly to train?
current way the models works is that they don't have memory, it's included in training (or has to be provided as context).
So to keep up with times the models have to be constantly trained.
One thing though is that right now it's not just incremental training, the whole thing gets updated - multiple parameters and how the model is trained is different.
This might not be the case in the future where the training could become more efficient and switch to incremental updates where you don't have to re-feed all the training data but only the new things.
I am simplifying here for brevity, but I think the gist is still there.
Updating the internal knowledge is not the primary motivator here, as you can easily, and more reliably (less hallucination), get that information at inference stage (through web search tool).
They're training new models because the (software) technology keeps improving, (proprietary) data sets keep improving (through a lot of manual labelling but also synthetic data generation), and in general researchers have better understanding of what's important when it comes to LLMs.
I feel oddly skeptical about this article; I can't specifically argue the numbers, since I have no idea, but... there are some decent open source models; they're not state of the art, but if inference is this cheap then why aren't there multiple API providers offering models at dirt cheap prices?
The only cheap-ass providers I've seen only run tiny models. Where's my cheap deepseek-R1?
Surely if its this cheap, and we're talking massive margins according to this, I should be able to get a cheap / run my own 600B param model.
Am I missing something?
It seems that reality (ie. the absence of people actually doing things this cheap) is the biggest critic of this set of calculations.
> but if inference is this cheap then why aren't there multiple API providers offering models at dirt cheap prices
There are multiple API providers offering models at dirt cheap prices, enough so that there is at least one well-known API provider that is an aggreggator of other API providers that offers lots of models at $0.
> The only cheap-ass providers I've seen only run tiny models. Where's my cheap deepseek-R1?
At 4-bit quant, R1 takes 300+ gigs just for weights. You can certainly run smaller models into which R1 has been distilled on a modest laptop, but I don't see how you can run R1 itself on anything that wouldn't be considered extreme for a laptop in at least one dimension.
There are 7 providers on that page which have higher output token price than $3.08. There is even 1 which has higher input token price than that. So that "all" is not true either.
I also have no idea on the numbers. But I do know that these same companies are pouring many billions of dollars into training models, paying very expensive staff, and building out infrastructure. These costs would need to be factored in to come up with the actual profit margins.
> I should be able to get a cheap / run my own 600B param model.
if the margins on hosted inference are 80%, then you need > 20% utilization of whatever you build for yourself for this to be less costly to you (on margin).
i self-host open weight models (please: deepseek et al aren't open _source_) on whatever $300 GPU i bought a few years ago, but if it outputs 2 tokens/sec then i'm waiting 10 minutes for most results. if i want results in 10s instead of 10m, i'll be paying $30000 instead. if i'm prompting it 100 times during the day, then it's idle 99% of the time.
coordinating a group buy for that $30000 GPU and sharing that across 100 people probably makes more sense than either arrangement in the previous paragraph. for now, that's a big component of what model providers, uh, provide.
Another giant problem with this article is we have no idea the optimizations used on their end. There are some widly complex optimizations these large AI companies use.
What I'm trying to say is that hosting your own model is in an entierly different leauge than the pros.
If we account for error in article implies higher cost I would argue it would return back to profit directly because how advanced optimization of infer3nce has become.
If actual model intelligence is not a moat (looking likely this is true) the real sauce of profitable AI companies is advanced optimizations across the entire stack.
Openai is NEVER going to release their specialized kernels, routing algos, quanitizations or model comilation methods. These are all really hard and really specific.
There's zero basis for assuming any of that. The most likely situation is a power law curve where the vast majority of users don't use it much at all and the top 10% of users account for 90% of the usage.
It is very likely that you are in the top 10% of users.
True. the article also has zero basis in its estimating the average usage from each tier's user base.
I somewhat doubt my usage is so close to the edge of the curve since I don't even pay for any plan. It could be that I'm very frugal with money and fat on consumption while most are more balanced, but 1M token per day in any case sounds slim for any user who pays for the service.
For sure an interesting calculation. Only one remark from someone with GPU metal experience:
> But compute becomes the bottleneck in certain scenarios. With long context sequences, attention computation scales quadratically with sequence length.
Even if the statement about quadratically scales is right, the bottleneck we are talking about is somewhere north by factor 1000. If 10k cores do only simple matrix operations each needs to have new data (up to 64k) available every 500 cycles (let's say). Getting these amount of data (without _any_ collision) means something like 100+GByte/s per core. Even 2+TByte/s on HBM means the bottleneck is the memory transfer rate, by something like 500 times. With collision, we talk about an additional factor like 5000 (last time I've done some tests with a 4090).
If multiple cores tries to get the same memory addresses, the MMU feeds only one core, the second one have to whait. Depends on the type of RAM, this will cost a lot of cycles.
GPU MMUs can handle multiple line in parallel. But not 10k cores at the same time. The HBM is not able to transfer 3.5TByte sequencial.
This is not my domain, but I assume the MMUs acting like a switch and something like multicast is not available here. I‘ve tried to implement such on a FPGA and it was extremely cost intensiv.
I believe it's that the bus can only serve one chip at a time, so it has to actually be faster since sometimes one chip's data will have to wait for the data of another chip to finish first.
This whole article is built off using DeepSeek R1, which is a huge premise that I don't think is correct. DeepSeek is much more efficient and I don't think it's a valid way to estimate what OpenAI and Anthropic's costs are.
That' what the buzz focused on, strange as we don't actually know what it cost them. While inference optimization is a fact and is even more impactful since training costs benefit from economics of scale.
I don't think that's strange at all, it's a much more palatable narrative for the mass who doesn't know what inference and training is and who think having conversations=training
Because to make GPT-5 or Claude better than previous models, you need to do more reasoning which burns a lot more tokens. So, your per-token costs may drop, but you may also need a lot more tokens.
GPT-5 can be configured extensively. Is there any point at which any configuration of GPT-5 that offers ~DeepSeek level performance is more expensive than DeepSeek per token?
Uhhh, I'm pretty sure DeepSeek shook the industry because of a 14x reduction in training cost, not inference cost.
We also don't know the per-token cost for OpenAI and Anthropic models, but I would be highly surprised if it was significantly more expensive than open models anyone can use and run themselves. It's not like they're also not investing in inference research.
That makes the calculation nonsensical, because if you go there... you'd also have to include all energy used in producing the content the other model providers used. So now suddenly everyones devices on which they wrote comments on social media, pretty much all servers to have ever served a request to open AI/Google/anthropics bots etc pp
Seriously, that claim was always completely disingenuous
I don't think it's that nonsensical to realize that in order to have AI, you need generations of artists, journalists, scientists, and librarians to produce materials to learn from.
And when you're using an actual AI model to "train" (copy), it's not even a shred of nonsense to realize the prior model is a core component of the training.
Isn't training cost a function of inference cost? From what I gathered, they reduced both.
I remember seeing lots of videos at the time explaining the details, but basically it came down to the kind of hardware-aware programming that used to be very common. (Although they took it to the next level by using undocumented behavior to their advantage.)
All reports by companies are alleged until verified by other, more trustworthy sources. I don't think it's especially notable that it's alleged because it's DeepSeek vs. the alleged numbers from other companies.
What are we meant to take away from the 8000 word Zitron post?
In any case, here is what Anthropic CEO Dario Amodei said about DeepSeek:
"DeepSeek produced a model close to the performance of US models 7-10 months older, for a good deal less cost (but not anywhere near the ratios people have suggested)"
"DeepSeek-V3 is not a unique breakthrough or something that fundamentally changes the economics of LLM’s; it’s an expected point on an ongoing cost reduction curve. What’s different this time is that the company that was first to demonstrate the expected cost reductions was Chinese."
We certainly don't have to take his word for it, but the claim is that DeepSeek's models are not much more efficient to train or inference than closed models of comparable quality. Furthermore, both Amodei and Sam Altman have recently claimed that inference is profitable:
Amodei: "If you consider each model to be a company, the model that was trained in 2023 was profitable. You paid $100 million, and then it made $200 million of revenue. There's some cost to inference with the model, but let's just assume, in this cartoonish cartoon example, that even if you add those two up, you're kind of in a good state. So, if every model was a company, the model, in this example, is actually profitable.
What's going on is that at the same time as you're reaping the benefits from one company, you're founding another company that's much more expensive and requires much more upfront R&D investment. And so the way that it's going to shake out is this will keep going up until the numbers go very large and the models can't get larger, and then it'll be a large, very profitable business, or, at some point, the models will stop getting better, right? The march to AGI will be halted for some reason, and then perhaps it'll be some overhang. So, there'll be a one-time, 'Oh man, we spent a lot of money and we didn't get anything for it.' And then the business returns to whatever scale it was at."
In terms of sources, I would trust Zitron a lot more than Altman or Amodei. To be charitable, those CEOs are known for their hyperbole and for saying whatever is convenient in the moment, but they certainly aren't that careful about being precise or leaving out inconvenient details. Which is what a CEO should do, more or less, but, I wouldn't trust their word on most things.
I agree we should not take CEOs at their word, we have to think about whether what they're saying is more likely to be true than false given other things we know. But to trust Zitron on anything is ridiculous. He is not a source at all: he knows very little, does zero new reporting, and frequently contradicts himself in his frenzy to believe the bubble is about to pop any time now. A simple example: claiming both that "AI is very little of big tech revenue" and "Big tech has no other way to show growth other than AI hype". Both are very nearly direct quotes.
The "efficiency" meantioned in blog post you have linked is the price difference between Deepseek and o1, it doesn't mean that GPT-5 or other SOTA models are less efficient.
His possible incentives and the fact OpenAI isn't a public company simply make it hard for us to gauge which of these statements is closer to the truth.
Criminal persecution?
This scheme has been perfected, like what do you want to persecute. Can you say with certainty that he means it's profitable overall? What if he means it's profitable right now today it is profitable, but not yesterday or in the last week. or what if he meant if you take the mean user its profitable? so much room for interpretation, that's why there is no risk for them
Yes, API pricing is usage based, but ChatGPT Pro pricing is a flat rate for a time period.
The question is then whether SaaS companies paying for GPT API pricing are profitable if they charge their users a flat rate for a time period. If their users trigger inference too much, they would also lose money.
This can be true if you assume that there exists a high number of $20 subscribers who don't use the product that much, but $200 subscribers squeeze every last bit and then some more. The balance could be still positive, but if you look at the power users alone, they might cost more than they pay.
They might even have decided “hey, these power users are willing to try and tells us what LLMs are useful for, and are even willing to pay us for the opportunity!”
I'm not entirely sure the analogy is fair - Amazon for example was 'ridiculed' for being hugely unprofitable for the first decade, but had underlying profitability if you removed capex.
As a counterpoint, if OpenAI were actually profitable at this early stage that could be a bad financial decision - it might mean that they aren't investing enough in what is an incredibly fierce and capital-intensive market.
Also admitting it would make this business impossible if they had to respect copyright law, so the laws shall be adjusted so that it can be a business.
That is my interpretation, that it's a marketing attempt. A form of "The value of our product is so good that it's losing us money. It's practically the Costco hotdog combo!".
There's the usual issue of a CEO "talking their book" but there's also the fact that Sam has a rich, documented history of lying. That was the central issue of his firing. "Empire of AI" has a detailed account of this. He would outright tell board member A that "board member B said X", based on his knowledge of the social dynamics of the board he assumed that A and B would never talk. But they eventually figured it out, it unraveled, and they confronted him in a group. Specifically, when they confronted him about telling Ilya Sutskever that Tasha McCauley said Helen Toner should step off the board, McCauley said "I never said that" and Altman was at a loss for words for a minute before finally mumbling "Well, I thought you could have said that. I don't know."
Doesn't he have an incentive to make it look like that, though? The way he phrased it, that they are losing money because people use it so much, makes it seem like Pro subscribers are some super power-users. As long as inference has a nonnegative, nonzero cost, then this case will lose money, so Sam isn't admitting that the business model is flawed or anything
> The most likely situation is a power law curve where the vast majority of users don't use it much at all and the top 10% of users account for 90% of the usage.
That'll be the Pro users. My wife uses her regular sub very lightly, most people will be like her...
His strategy is to sell OpenAI stock like it was Bitcoin in 2020, and if for some reason the market decides that maybe a company that loses large amounts of cash isn't actually a good investment... he'll be fine, he's had plenty of time to turn some of his stock into money :)
This seems very very far off. From the latest reports, anthropic has a gross margin of 60%. It came out in their latest fundraising story. From that one The Information report, it estimated OpenAI's GM to be 50% including free users. These are gross margins so any amortization or model training cost would likely come after this.
Then, today almost every lab uses methods like speculative decoding and caching which reduce the cost and speed up things significantly.
The input numbers are far off. The assumption is 37B of active parameters. Sonnet 4 is supposedly a 100B-200B param model. Opus is about 2T params. Both of them (even if we assume MoE) wont have exactly these number of output params. Then there is a cost to hosting and activating params at inference time. (the article kind of assumes it would be the same constant 37B params).
Are you saying that you think Sonnet 4 has 100B-200B _active_ params? And that Opus has 2T active? What data are you basing these outlandish assumptions on?
Oh nothing official. There are people who estimate the sizes based on tok/s, cost, benchmarks etc. The one that most go on is https://lifearchitect.substack.com/p/the-memo-special-editio.... This guy estimated Claude 3 opus to be 2T param model (given the pricing + speed). Opus 4 is 1.2T param according to him (but then I dont understand why the price remained the same.). Sonnet is estimated by various people to be around 100B-200B params.
Gross margins also don't tell the whole story, we don't know how much Azure and Amazon charge for the infrastructure and we have reasons to believe they are selling it at a massive discount (Microsoft definitely does that, as follows from their agreement with OpenAI). They get the model, OpenAI gets discounted infra.
A discounted Azure H100 will still be more than $2 per hour. Same goes for AWS. Trainium chips are new and not as effective (not saying they are bad) but still cost in the same range.
For inference, gross margins are exactly: (what companies charge per 1M tokens to the user) - (direct cost to produce that 1M tokens which is GPU costs).
Exactly. All of the claims that OpenAI is losing money on every request are wrong. OpenAI hasn’t even unlocked all of their possible revenue opportunities from the free tier such as ads (like Google search), affiliate links, and other services.
There’s also a lot of comments in this thread who want LLM companies to fail for different reasons, so they’re projecting that wish on to imagined unit economics.
I’m having flashbacks to all of the conversations about Uber and claims that it was going to collapse as soon as the investment money ran out. Then Uber gradually transitioned to profitability and the critics moved to using the same shtick on AI companies.
If they're profitable, why on earth are they seeking crazy amounts of investment month after month? It seems like they'll raise 10 billion one month, and then immediately turn around and raise another 10 billion a month or two after that. If it's for training, it seems like a waste of money since GPT-5 doesn't seem like it's that much of an improvement.
No, the argument is that Uber was going to lose money hand over fist until all of the alternatives were starved to death, then raise prices infinitely.
Taxis sucked. Any disruptor who was willing to just... Tell people what the cost would be ahead of time without scamming them, and show up when they said they would, was going to win.
Uber (and Lyft) didn't starve the alternatives: they were already severely malnourished. Also, they found a loophole to get around the medallion system in several cities, which taxi owners used in an incredibly anticompetitive fashion to prevent new competition.
Just because Uber used a shitty business practice to deliver the killing blow doesn't mean their competition were undeserving of the loss, or that the traditional taxis weren't without a lot of shady practices.
Spoiler alert: in most of the world taxis are still there and at best Uber is just another app you can use to call them.
And lifetime profits for Uber are still at best break even which means that unless you timed the market perfectly, Uber probably lost you money as a shareholder.
Uber is just distorted in valuation by its presence in big US metro areas (which basically have no realistic transportation alternative).
So inference is cheap but training is expensive and getting more expensive. It seems like if they can't get training expenses down, cheap inference won't matter.
As someone who has been taking the largest part of Google and facebooks ad wallet share away,
Let me tell you something.
Advertising is now a very very locked in market and will take over a decade to shift even a significant minority it into
OpenAIs hands. This is not likely the first or even second monetization strategy imo.
There are two companies gaining significant wallet share: Amazon and TikTok. Of those only one is taking a significant early share of both Google and Facebook.
Yeah Dario has said similar things in interviews. The way he explained it, if you look at each specific model (such as Sonnet 3.5) as its own separate company, then each one of them is profitable in the end. They all eventually recoup the expense of training, thanks to good profit margins on usage once they are deployed.
Yeah I've seen the same sentiment from a few others as well. Inference likely is profitable. Training is incredibly expensive and will sometimes not yield positive results.
It's wild and, while they're all guilty, Gemini is a particularly egregious offender. What really surprises me is that they don't even consider it a bug if you can predictably get it to generate copyrighted content. These types of exploits are out of scope of their bug bounty program and they suggest the end user file a ticket whenever they encounter such issues (i.e. they're just saying YOLO until there's case law).
Since DeepSeek R1 is open weight, wouldn't it be better to validate the napkin math to validate how many realistic LLM full inferences can be done on a single H100 in a time period, and calculate the token cost of that?
Without having in depth knowledge of the industry, the margin difference between input and output tokens is very odd to me between your napkin math and the R1 prices. That's very important as any reasoning model explodes reasoning tokens, which means you'll encounter a lot more output tokens for fewer input tokens, and that's going to heavily cut into the high margin ("essentially free") input token cost profit.
I am so glad someone else called this out, I was reading the napkin math portions and struggling to see how the numbers really worked out and I think you hit the nail on the head. The author is assuming 'essentially free' input token cost and extrapolating in a business model that doesn't seem to connect directly to any claimed 'usefulness'. I think the bias on this is stated in the beginning of the article clearly as the author assumes 'given how useful the current models are...'. That is not a very scientific starting point and I think it leads to reasoning errors within the business model he posits here.
There were some oddities with the numbers themselves as well but I think it was all within rounding, though it would have been nice for the author to spell it out when he rounded some important numbers (~s don't tell me a whole lot).
TL;DR I totally agree, there are some napkin math issues going on here that make this pretty hard to see as a very useful stress test of cost.
This kind of presumes you're just cranking out inference non-stop 24/7 to get the estimated price, right? Or am I misreading this?
In reality, presumably they have to support fast inference even during peak usage times, but then the hardware is still sitting around off of peak times. I guess they can power them off, but that's a significant difference from paying $2/hr for an all-in IaaS provider.
I'm also not sure we should expect their costs to just be "in-line with, or cheaper than" what various hourly H100 providers charge. Those providers presumably don't have to run entire datacenters filled to the gills with these specialized GPUs. It may be a lot more expensive to do that than to run a handful of them spread among the same datacenter with your other workloads.
Yes. But these are on demand prices, so you could just turn them off when loads are less.
But there is no way that OpenAI should be more expensive than this. The main cost is the capex of the H100s, and if you are buying 100k at a time you should be getting a significant discount off list price.
> In reality, presumably they have to support fast inference even during peak usage times, but then the hardware is still sitting around off of peak times. I guess they can power them off, but that's a significant difference from paying $2/hr for an all-in IaaS provider.
They can repurpose those nodes for training when they aren't being used for inference. Or if they're using public cloud nodes, just turn them off.
These articles (of which there are many) all make the same basic accounting mistakes. You have to include all the costs associated with the model, not just inference compute.
This article is like saying an apartment complex isn’t “losing money” because the monthly rents cover operating costs but ignoring the cost of the building. Most real estate developments go bust because the developers can’t pay the mortgage payment, not because they’re negative on operating costs.
If the cash flow was truly healthy these companies wouldn’t need to raise money. If you have healthy positive cash flow you have much better mechanisms available to fund capital investment other than selling shares at increasingly inflated valuations. Eg issue a bond against that healthy cash flow.
Fact remains when all costs are considered these companies are losing money and so long as the lifespan of a model is limited it’s going to stay ugly. Using that apartment building analogy it’s like having to knock down and rebuild the building every 6 months to stay relevant, but saying all is well because the rents cover the cost of garbage collection and the water bill. That’s simply not a viable business model.
Update Edit: A lot of commentary below re the R&D and training costs and if it’s fair to exclude that on inference costs or “unit economics.” I’d simply say inference is just selling compute and that should be high margin, which the article concludes it is. The issue behind the growing concerns about a giant AI bubble is if that margin is sufficient to cover the costs of everything else. I’d also say that excluding the cost of the model from “unit economics” calculations doesn’t make business/math/economics since it’s literally the thing being sold. It’s not some bit of fungible equipment or long term capital expense when they become obsolete after a few months. Take away the model and you’re just selling compute so it’s really not a great metric to use to say these companies are OK.
> Fact remains when all costs are considered these companies are losing money
You would need to figure out what exactly they are losing money on. Making money on inference is like operating profit - revenue less marginal costs. So the article is trying to answer if this operating profit is positive or negative. Not whether they are profitable as a whole.
If things like cost of maintaining data centres or electricity or bandwidth push them into the red, then yes, they are losing money on inference.
If the things that make them lose money is new R&D then that's different. You could split them up into a profitable inference company and a loss making startup. Except the startup isn't purely financed by VC etc, but also by a profitable inference company.
Yes that's right. The inference costs in isolation are interesting because that speaks to the unit economics of this business: R&D / model training aside, can the service itself be scaled to operate at a profit? Because that's the only hope of all the R&D eventually paying dividends.
One thing that makes me suspect inference costs are coming down is how chatty the models have become lately, often appending encouragement to a checklist like "You can check off each item as you complete them!" Maybe I'm wrong, but I feel if inference was killing them, the responses would become more terse rather than more verbose.
For the top few providers, the training is getting amortized over absurd amount of inference. E.g. Google recently mentioned that they processed 980T tokens over all surfaces in June 2025.
The leaked OpenAI financial projections for 2024 showed about equal amount of money spent on training and inference.
Amortizing the training per-query really doesn't meaningfully change the unit economics.
> Fact remains when all costs are considered these companies are losing money and so long as the lifespan of a model is limited it’s going to stay ugly. Using that apartment building analogy it’s like having to knock down and rebuild the building every 6 months to stay relevant. That’s simply not a viable business model.
To the extent they're losing money, it's because they're giving free service with no monetizaton to a billion users. But since the unit costs are so low, monetizing those free users with ads will be very lucrative the moment they decide to do so.
Assuming users accept those ads. Like, would they make it clear with a "sponsored section", or would they just try to worm it into the output? I could see a lot of potential ways that users reject the ad service, especially if it's seen to compromise the utility or correctness of the output.
(Author here). Yes I am aware of that and did mention it. However - what I wanted to push back in this article was that claude code was completely unsustainable and therefore a flash in the pan and devs aren't at risk (I know you are not saying this).
The models as is are still hugely useful, even if no further training was done.
The marginal cost is not the salient factor when the model has to be frequently retrained at great cost. Even if the marginal cost was driven to zero, would they profit?
But they don't have to be retained frequently at great cost. Right now they are retrained frequently because everyone is frequently coming out with new models and nobody wants to fall behind. But if investment for AI were to dry up everyone would stop throwing so much money at R&D, and if everyone else isn't investing in new models you don't have to either. The models are powerful as they are, most of the knowledge in them isn't going to rapidly obsolete, and where that is a concern you can paper over it with RAG or MCP servers. If everyone runs out of money for R&D at the same time we could easily cut back to a situation where we get an updated version of the same model every 3 years instead of a bigger/better model twice a year.
And whether companies can survive in that scenario depends almost entirely on their unit economics of inference, ignoring current R&D costs
Like we've seen with Karparthy & Murati starting their own labs, it's to be expected that over the next 5 years, hundreds of engineers & researchers at the bleeding edge will quit and start competing products. They'll reliably raise $1b to $5b in weeks, too. And it's logical: for an investor, a startup founded by a Tier 1 researcher will more reliably 10-100x your capital, vs. Anthropic & OpenAI that are already at >$250b+.
This talent diffusion guarantees that OpenAI and Anthropic will have to keep sinking in ever more money to stay at the bleeding edge, or upstarts like DeepSeek and incumbents like Meta will simply outspend you/hire away all the Tier 1 talent to upstage you.
The only companies that'll reliably print money off AI are TSMC and NVIDIA because they'll get paid either way. They're selling shovels and even if the gold rush ends up being a bust, they'll still do very well.
True. But at some point the fact that there are many many players in the market will start to diminish the valuation of each of those players, don’t you think? I wonder what that point would be.
> But if investment for AI were to dry up everyone would stop throwing so much money at R&D, and if everyone else isn't investing in new models you don't have to either
IF.
If you do stagnate for years someone will eventually decide to invest and beat you. Intel has proven so.
I upvoted it because it aligns most closely with my own perspective. I have a strong dislike for AI and everything associated with it, so my judgment is shaped by that bias. If a post sounds realistic or complex, I have no interest in examining its nuance. I am not concerned with practical reality and prefer to accept it without thinking, so I support ideas that match my personal viewpoint.
I don’t understand why people like you have to call this stuff out? Like most of HN thinks the way I do and that’s why the post was upvoted. Why be a contrarian? There’s really no point.
> claude code was completely unsustainable and therefore a flash in the pan and devs aren't at risk
How can you possibly say this if you know anything about the evolution of costs in the past year?
Inference costs are going down constantly, and as models get better they make less mistakes which means less cycles = less inference to actually subsidize.
This is without even looking at potential fundamental improvements in LLMs and AI in general. And with all the trillions in funding going into this sector, you can't possibly think we're anywhere near the technological peak.
Speaking as a founder managing multiple companies: Claude Code's value is in the thousands per month /per person/ (with the proper training). This isn't a flash in the pan, this isn't even a "prediction" - the game HAS changed and anyone telling you it hasn't is trying to cover their head with highly volatile sand.
I think the point isn't to argue AI companies are money printers or even that they're fairly valued, it's that at least the unit economics work out. Contrast this to something like moviepass, where they were actually losing money on each subscriber. Sure, a company that requires huge capital investments that might never be paid back isn't great either, but at least it's better than moviepass.
Unit economics needs to include the cost of the thing being sold, not just the direct cost of selling it.
Unit economics is mostly a manufacturing concept and the only reason it looks OK here is because of not really factoring in the cost of building the thing into the cost of the thing.
Someone might say I don’t understand “unit economics” but I’d simply argue applying a unit economics argument saying it’s good without including the cost of model training is abusing the concept of unit economics in a way that’s not realistic from a business/economics sense.
The model is what’s being sold. You can’t just sell “inference” as a thing with no model. Thats just selling compute, which should be high margin. The article is simply affirming that by saying yes when you’re just selling compute in micro-chunks that’s a decent margin business which is a nice analysis but not surprising.
The cost of “manufacturing” an AI response is the inference cost, which this article covers.
> That would be like saying the unit economics of selling software is good because the only cost is some bandwidth and credit card processing fees. You need to include the cost of making the software
Unit economics is about the incremental value and costs of each additional customer.
You do not amortize the cost of software into the unit economics calculations. You only include the incremental costs of additional customers.
> just like you need to include the cost of making the models.
The cost of making the models is important overall, but it’s not included in the unit economics or when calculating the cost of inference.
That isn't what unit economics is. The purpose of unit economics is to answer: "How much money do I make (or lose) if I add one more customer or transaction?". Since adding an additional user/transaction doesn't increase the cost of training the models you would not include the cost of training the models in a unit economics analysis. The entire point of unit economics is that it excludes such "fixed costs".
The thing about large fixed costs is that you can just solve them with growth. If they were losing money on inference alone no amount of growth would help. It's not clear to me there's enough growth that everybody makes it out of this AI boom alive, but at least some companies are going to be able to grow their way to profitability at some point, presumably.
There is no marginal cost for training, just like there's no marginal cost for software. This is why you don't generally use unit economics for analyzing software company breakeven.
The only reason unit economics aren't generally used for software companies is the profit margin is typically 80%+. The cost of posting a Tweet on Twitter/X is close to $0.
Compare the cost of tweeting to the cost of submitting a question to ChatGPT. The fact that ChatGPT rate limits (and now sells additional credits to keep using it after you hit the limit) indicates there are serious unit economic considerations.
We can't think of OpenAI/Anthropic as software businesses. At least from a financial perspective, it's more similar to a company selling compute (e.g. AWS) than a company selling software (e.g. Twitter/X).
2. “Open source” is great but then it’s just a commodity. It would be very hard to build a sustainable business purely on the back of commoditized models. Adding a feature to an actual product that does something else though? Sure.
There is plenty of money to be made from hosting open source software. AWS for instance makes tons of money from Linux, MySQL, Postgres, Redis, hosting AI models like DeepSeek (Bedrock) etc.
Your comment may apply to the original commenter “missing” the point of TFA and to the person replying “missing” the point of that comment. And to my comment “missing” the point of yours - which may have also “missed” the point.
I’ve clearly “missed” the point you were trying to make, because there’s nothing complicated: The article is about unit economics and marginal costs of inferences and this comment thread is trying to criticize the article based on a misunderstanding of what unit economics means.
I was not trying to make any point. I’m not even sure if the comment I replied to was suggesting that it was you or the other commenter who was missing some point or another.
Worth noting that the post only claims they should be profitable for the inference of their paying customers on a guesstimated typical workload. Free users and users with atypical usage patterns will obviously skew the whole picture. So the argument in the post is at least compatible with them still losing money on inference overall.
Excluding training two of their biggest costs will be payroll and inferencing for all the free users.
It’s therefore interesting that they claimed it was close: this supports the theory inferencing from paid users is a (big) money maker if it’s close to covering all the free usage and their payroll costs?
“I think that tends to end poorly because as demand for your service grows, you lose more and more money. Sam Altman actually addressed this at dinner. He was asked basically, are you guys losing money every time someone uses ChatGPT?
And it was funny. At first, he answered, no, we would be profitable if not for training new models. Essentially, if you take away all the stuff, all the money we're spending on building new models and just look at the cost of serving the existing models, we are sort of profitable on that basis.
And then he looked at Brad Lightcap, who is the COO, and he sort of said, right? And Brad kind of like squirmed in his seat a little bit and was like, well, we're pretty close.
We're pretty close. We're pretty close.
So to me, that suggests that there is still some, maybe small negative unit economics on the usage of ChatGPT. Now, I don't know whether that's true for other AI companies, but I think at some point, you do have to fix that because as we've seen for companies like Uber, like MoviePass, like all these other sort of classic examples of companies that were artificially subsidizing the cost of the thing that they were providing to consumers, that is not a recipe for long-term success.”
From Hard Fork: Is This an A.I. Bubble? + Meta’s Missing Morals + TikTok Shock Slop, Aug 22, 2025
GPT-5 was I suppose their attempt to make a product that provides as good metrics as their earlier products.
Uber doesn't really compare, as they had existing competition from taxi companies that they first had to/have to destroy. And cars or fuel didn't get 10x cheaper over the time of Uber's existence, but I'm sure that they still can optimize a lot for efficiency.
I'm more worried about OpenAIs capability to build a good moat. Right now it seems that each success is replicated by the competing companies quickly. Each month there is a new leader in the benchmarks. Maybe the moat will be the data in the end, i.e. there is barriers nowadays to crawl many websites that have lots of text. Meanwhile they might make agreements with the established AI players, maybe some of those agreements will be exclusive. Not just for training but also for updating wrt world news.
It’s funny you mention apartments, because that is exactly the comparison i thought of, but with the opposite conclusion. If you buy an apartment with debt, but get positive cash flow from rent, you wouldn’t call that unprofitable or a bad investment. It takes X years to recoup the initial debt, and as long as X is achievable that’s a good deal.
Hoping for something net profitable including fixed costs from day 1 is a nice fantasy, but that’s not how any business works or even how consumers think about debt. Restaurants get SBA financing. Homeowners are “net losing money” for 30 years if you include their debt, but they rightly understand that you need to pay a large fixed cost to get positive cash flow.
R&D is conceptually very similar. Customer acquisition also behaves that way
Running with your analogy having positive cash flow and buying a property to hold for the long term makes sense. Thats the classic mortgage scenario. But it takes time for that math to work out. Buying a new property every 6 months breaks that model. That’s like folks that keep buying a new car and rolling “negative equity” into a new deal. It’s insanity financially but folks still do it.
I think the nuance here is what people consider the “cost” of “inference.” Purely on compute costs and not accounting for the cost of the model (which is where the article focuses) it’s not bad.
Their assumption is that training is a fixed cost: you'll spend the same amount on training for 5 users as you will with 500 million users.
Spending hundreds of millions of dollars on training when you are two guys in a garage is quite significant, but the same amount is absolutely trivial if you are planet-scale.
The big question is: how will training cost develop? Best-case scenario is a one-and-done run. But we're now seeing an arms race between the various AI providers: worst-case scenario, can the market survive an exponential increase in training costs for sublinear improvements?
I think this is missing the point that the very interesting article makes.
You're arguing that maybe the big companies won't recoup their investment in the models, or profitably train new ones.
But that's a separate question. Whether a model - which now exists! - can profitably be run is very good to know. The fact that people happily pay more than the inference costs means what we have now is sustainable. Maybe Anthropic of OpenAI will go out of business or something, but the weights have been calculated already, so someone will be able to offer that service going forward.
It hasn't even proven that, it's assuming a ridiculous daily usage, and also ignoring free riders. Running a model is likely not profitable for any provider right now. Even a public company (e.g alphabet) isn't obliged to honest figures since numbers on the sheets can be moved left and right. We won't know for a other year or two when companies we have today start falling and their founders start talking.
Self hosting LLMs isn’t completely out of the realm of feasibility. Hardware cost may be 2-3x a hardcore gaming rig but it would be neat to see open source, self hosted, coding helpers. When Linux hit the scenes it put UNIX(ish) power in the hands of anyone with no license fee required. Surely somewhere someone is doing the same with LLM assisted coding.
Costs will go up to levels where people will no longer find this stuff as useful/interesting. It’s all fun and games until the subsides end.
See the recent reactions to AWS pricing on Kiro where folks had a big WTF reaction on pricing after, it appears, AWS tried to charge realistic pricing based on what this stuff actually costs.
Isn’t AWS always quite expensive? Look at their margins and the amount of cash it throws off, versus the consumer/retail business which runs a ton more revenue but no profit.
If you’re applying the same pricing structure to Kiro as to all AWS products then, yeah, it’s not particularly hobbyist accessible?
The article is answering a specific question, and has excluded this on purpose. If you have a sunk training cost you still want to know if you can at least operate profitably.
"This article is like saying an apartment complex isn’t “losing money” because the monthly rents cover operating costs but ignoring the cost of the building. Most real estate developments go bust because the developers can’t pay the mortgage payment, not because they’re negative on operating costs."
> if you have healthy positive cash flow you have much better mechanisms available to fund capital investment other than selling shares. Eg issue a bond against that healthy cash flow.
Is that actually true in 2025? Presumably you have to make coupon payments on a bond(?), but shares are free. Companies like Meta have shown you can issue shares that don't come with voting rights and people will buy them, and meme stocks like GME have demonstrated the effectiveness of churning out as many shares as the market will bear.
Agree it’s not the fashionable thing. There’s a line from The Big Short of “This is Wall Street Dr Bury, if you offer us free money we’re going to take it.”
These companies are behaving the same way. Folks are willing to throw endless money into the present pit so on the one hand I can’t blame them for taking it.
Reality is though that when the hype wears off it’s only throwing more gasoline on the fire and building a bigger pool of investors that’s will become increasingly desperate to salvage returns. History says time and time again that story doesn’t end well and that’s why the voices mumbling “bubble” under their breath are getting louder every day.
Ok, one issue I have with this analysis is the breakdown between input and output tokens. I'm the kind of person who spend most of my chat asking questions, so I might only use 20ish input tokens per prompt, where Gemini is having to put out several hundred, which would seem to affect the economics quite a bit
Yeah, I've noticed Chatgpt5 is very chatty. I can ask a 1 sentence question and get back 3-4 paragraphs, most of which I ignore, depending upon the task.
Same. It acts like its output tokens are for free. My input output ratio is like 1 to 10 at least. Not counting "Thought" and it's internal generation for agentic tasks.
It may hurt them financially but they are fighting for market share and I'd argue short answers will drive users away. I prefer the long ones much more as they often include things I haven't directly asked about but are still helpful.
Everyone claiming AI companies are a financial ticking time bomb are using the same logic people used back in the 2000s when they claimed Amazon “never made a profit” and thus was a bad investment.
Basically- the same math as modern automated manufacturing. Super expensive and complex build-out - then a money printer once running and optimized.
I know there is lots of bearish sentiments here. Lots of people correctly point out that this is not the same math as FAANG products - then they make the jump that it must be bad.
But - my guess is these companies end up with margins better than Tesla (modern manufacturer), but less than 80%-90% of "pure" software. Somewhere in the middle, which is still pretty good.
Also - once the Nvidia monopoly gets broken, the initial build out becomes a lot cheaper as well.
And if you ever stop/step off the treadmill and jack up prices to reach profitability, a new upstart without your sunk costs will immediately create a 99% solution and start competing with you. Or more like hundreds of competitors. Like we've seen with Karpathy & Murati, any engineer with pedigree working on the frontline models can easily raise billions to compete with them.
Expect the trend to pick up as the pool of engineers who can create usable LLMs from scratch increases through knowledge/talent diffusion.
The LLM scene is an insane economic bloodbath right now. The tech aside, the financial moves here are historical. It's the ultimate wet dream for consumers - many competitors, face-ripping cap-ex, any missteps being quickly punished, and a total inability to hold back anything from the market. Companies are spending hundreds of billions to put the best tech in your hands as fast and as cheaply as possible.
If OpenAI didn't come along with ChatGPT, we would probably just now be getting Google Bard 1.0 with an ability level of GPT-3.5 and censorship so heavy it would make it useless for anything beyond "Tell me who the first president was".
Only introducing this *NVIDIA AI Released Jet-Nemotron: 53x Faster Hybrid-Architecture Language Model Series that Translates to a 98% Cost Reduction for Inference at Scale, into the conversation as it just dropped (so it is timely) and while it seems unlikely either OpenAI or Anthropic use this or a technique like it (yet or if they even can), these types of breakthroughs may introduce dramatic savings for both closed and open source inference at scale moving forward https://www.marktechpost.com/2025/08/26/nvidia-ai-released-j...
As the author seems to admit, an outsider is going to lack so much information (costs, loss leaders, etc), one has to assume any modeling is so inaccurate that it's not worth anything.
So the question remains unanswered, at least for us. For those putting money in, you can be absolutely certain they have a model with sufficient data to answer the question. Since money did go in, even if it's venture, the answer is probably "yes in the immediate, but no over time."
And that's assuming a more likely 1 byte per parameter.
So the article is only off by a factor of at least 1,000. I didn't check any of the rest of the math, but that probably has some impact on their conclusions...
Edit: Oh assuming this is an estimate based on the model weights moving fromm HBM to SRAM, that's not how transformers are applied to input tokens. You only have to do move the weights for every token during generation, not during "prefill". (And actually during generation you can use speculative decoding to do better than this roofline anyways).
There's also an estimation of how much a KV cache grows with each subsequent token. That would be roughly ~MBs/token. I think that would be the bottleneck
Your calculations make no sense. Why are you loading the model for each token independently? You can process all the input tokens at the same time as long as they can fit in memory.
You are doing the calculation as they were output tokens on a single batch, it would not make sense even in the decode phase.
This. ChatGPT also agrees with you: "74 GB weight read is per pass, not per token." I was checking the math in this blog post with GPT to understand it better and it seems legit for the given assumptions.
The estimation for output token is too low since one reasoning-enabled response can burn through thousands of output tokens. Also low for input tokens since in actual use there're many context (memory, agents.md, rules, etc) included nowadays.
When you are operating at scale you are likely to use a small model during the auto regressive phase to generate sequential tokens and only involve the large model once you've generated several tokens. Whenever the two predict the same output you effectively generate more than one token at a time. The idea is the models will agree often enough to significantly reduce output token costs. Does anyone know how effective that is in practice?
A full KV-cache is quite big compared to the weights of the model (depending on the context size), that should be a factor too (and basically you need to maintain a separate KV cache for each request, I think...).
Also the the token/s is not uniform across the request and it's getting slower with each subsequent generated token.
On the other side, there's an insane booster of speculative decoding, that would give a semi-prefill rate for decoding, but the memory pressure is still a factor.
I would be happy to be corrected regarding both factors.
An interesting exercise would be what prompts would create the most costs for LLMs but outwards little to no costs for the issuers. Are all prompts equal and the only factor being the lengths of the input and output prompts? Or is there processing of the prompts that could be exceedingly expensive for the LLM?
Not wishing to do a shallow dismissal here, but I always assumed AI must be profitable on inference otherwise no one would pursue it as a business given how expensive the training is.
It seems sort of like wondering if a fiber ISP is profitable per GB bandwidth. Of course it is; the expensive part is getting the fiber to all the homes. So the operations must be profitable or there is simply no business model possible.
AI right now seems more like a religious movement than a business one. It doesn't matter how much it costs (to the true believers), its about getting to AGI first.
This is a great article, but it doesn't appear to model H100 downtime in the $2/hr costs. It assumes that OpenAI and Anthropic can match demand for inference to their supply of H100s perfectly, 24/7, in all regions. Maybe you could argue that the idle H100s are being used for model training - but that's different to the article's argument that inference is economically sustainable in isolation.
Consider some of the scaling properties of frontier cloud LLMs:
1) routing: traffic can be routed to smaller, specialized, or quantized models
2) GPU throughput vs latency: both parameters can be tuned and adjusted based on demand. What seems like lots of deep "thinking" might just be trickling the inference over less GPU resources for longer.
Good breakdown of the costs involved. Even if they're running at a loss, OpenAI and Anthropic receive considerable value from the free training data users are providing through their conversations. Looking at it another way, these companies are paying for the training data to make their models better for future profitability.
I don't believe the asymmetry between prefill and decode is that large. If it were, it would make no sense for most of the providers to have separate pricing for prefill with cache hits vs. without.
(But yes, they claim 80% margins on the compute in that article.)
> When established players emphasize massive costs and technical complexity, it discourages competition and investment in alternatives
But it's not the established players emphasizing the costs! They're typically saying that inference is profitable. Instead the false claims about high costs and unprofitability are part of the anti-AI crowd's standard talking points.
Yes. I was really surprised at this myself (author here). If you have some better numbers I'm all ears. Even on my lowly 9070XT I get 20x the tok/s input vs output, and I'm not doing batching or anything locally.
I think the cache hit vs miss stuff makes sense at >100k tokens where you start getting compute bound.
I linked to the writeup by Deepseek with their actual numbers from production, and you want "better numbers" than that?!
> Each H800 node delivers an average throughput of ~73.7k tokens/s input (including cache hits) during prefilling or ~14.8k tokens/s output during decoding.
That's a 5x difference, not 1000x. It also lines up with their pricing, as one would expect.
(The decode throughputs they give are roughly equal to yours, but you're claiming a prefill performance 200x times higher than they can achieve.)
A good rule of thumb is that a prefill token is about 1/6th the compute cost of decode token, and that you can get about 15k prefill tokens a second on Llama3 8B on a single H100. Bigger models will require more compute per token, and quantization like FP8 or FP4 will require less.
Maybe because you aren’t doing batching? It sounds like you’re assuming that would benefit prefill more than decode, but I believe it’s the other way around.
Model context limits are not “artificial” as claimed.
The largest context window a model can offer at a given quality level depends on the context size the model was pretrained with as well as specific fine tuning techniques.
It’s not simply a matter of considering increased costs.
This kinda tracks with the latest estimate of power usage of llm inference published by google https://news.ycombinator.com/item?id=44972808. If inference isnt that power hungry like people thought, they must be able to make good money from those subscriptions.
"Heavy readers - applications that consume massive amounts of context but generate minimal output - operate in an almost free tier for compute costs."
Not saying there's not interesting analysis here, but this is assuming that they don't have to pay for access to the massive amounts of context. Sources like stackoverflow and reddit that used to be free, are not going to be available to keep the model up to date.
If this analysis is meant to say "they're not going to turn the lights out because of the costs of running", that may be so, but if they cannot afford to keep training new models every so often they will become less relevant over timte, and I don't know if they will get an ocean of VC money to do it all again (at higher cost than last time, because the sources want their cut now).
I thought the thing that made DeepSeek interesting (besides competition from China) was that its inference costs were something like 1/10th. So unless that gap has been bridged (has it?) I don't think a calculation based on DeepSeek can apply to OpenAI or Anthropic.
Idk what is going on but I'm using it all day for free, no limits in sight yet... It's just for small things, but for sure I would have had to pay 6 months ago. I actually would if they prompted tbh. Although I still find that whole "You can't use the webUI with your API credits" annoying. Why not? Why make me run OpenWebUI or LibreChat?
I guess my use is absolutely nothing compare to someone with a couple of agents running continuously.
Not during prefill, i.e. the very first token generated in a new conversation.
During this forward pass, all tokens in the context are all processed at the same time, and then attention's KV are cached, you still generate a single token, but you need to compute attention from all tokens to all tokens.
From that point on every subsequent tokens is processed sequentially in autoregressive way, but because we have the KV cache, this becomes O(N) (1 token query to all tokens) and not O(N^2)
The API prices of $3/$15 are not right for a lot of models. see at openrouter, the gpt-oss-120b ones https://openrouter.ai/openai/gpt-oss-120b, it's more like $0.01/$0.3 (and that model actually needs h200/b200 to have good throughput).
Some agent startups are already feeling the squeeze — The Information reported Cursor’s gross margins hit –16% due to token costs. So even if inference is profitable for OAI/Anthropic, downstream token-hungry apps may not see the same unit economics, and that is why token-intensive agent startups like Cursor and Perplexity are taking open-source models like Qwen or other OSS-120B and post-training them to bring down inference costs.
I wouldn't be surprised if their profit/query is at a negative for all major Ai companies, but guess what?
They have a service which understands a users question/needs 100x better than a traditional Google search does.
Once they tap into that for PPC/paid ads, their profit/query should jump into the green. In fact, there's a decent chance a lot of these models will go 100% free once that PPC pipeline is implemented and shown to be profitable.
If they start showing ads based on your prompts, and your history of "chats", it will erode the already shaky trust that users have in the bots. "Hallucinations" are one thing, but now you'll be asking yourself all the time: is that the best answer the llm can give me, or has it been trained to respond in ways favourable to its advertisers?
This is the exact same issues Facebook/YouTube/etc had with ads. In the end, ads won.
Google used to segregate ads very clearly in the beginning. Now they look almost the same as results. I've switched to DDG since then, but have the majority of users? Nope. Even if they're not using ad blockers, most people seem to not mind the ads.
With LLMs, the ads will be even more harder to tell apart from non-ads.
Sometimes a statement is just too obvious to need extensive sourcing, and this is one of those times.
Gemini doesn’t always find very much better results, but it usually does. It beggars belief to claim that it doesn’t also understand the query much better than Rankbrain et al.
Those same H100's are probably also going to be responsible for the R&D of new model versions. Running a model is definitely cheaper than training them.
With the heat turning up on AI companies to explain how they will land on a viable business model some of this is starting to look like WeWork’s “Community Adjusted EBITA” arguments of “hey if you ignore where we’re losing money, we’re not losing money!” that they made right before imploding.
I think most folks understand that pure inference in a vacuum is likely cash flow positive, but that’s not why folks are asking increasingly tough questions on the financial health of these enterprises.
Don’t disagree it’s what investors want. Point is just that we’re approaching a point from an economics standpoint where the credibility of the “it’s ok because we’re investing in R&D” argument is rapidly wearing thin.
WeWork’s investors didn’t want them to focus on business fundamentals either and kept pumping money elsewhere. That didn’t turn out so well.
The only reason they wouldn’t be losing money on inference is if more costly (more computationally intensive) inference wouldn’t be able to give them an extra edge, which seems unlikely to me.
So, if this is true, OpenAI needs much better conversion rates, because they have ~15 million paying users compared to 800 million weekly active users:
I'm not so sure. Inserting ads into chatbot output is like inserting ads into email. People are more reluctant to tolerate that than web or YouTube ads (which are hated already).
If they insert stealth ads, then after the third sponsored bad restaurant suggestion people will stop using that feature, too.
Mmm let's see. I think in LLM ads are probably have the most intent (and therefore most value) of any ads. They are like search PPC ads on steroids as you have even more context of what the user is actually looking for.
Hell they could even just add affiliate tracking to links (and not change any of the ranking based on it) and probably make enough money to cover a lot of the inference for free users.
Another comment mentioned the cost associated with the model. Setting that aside, wouldn't we also need to include all of the systems around the inference? I can imagine significant infrastructure and engineering needs around all of these various services, along with the work needed to keep these systems up and running.
Or are these costs just insignificant compared to inference?
All incremental costs should be included. If adding each 100,000 new customers requires 1 extra engineer you would include that. We don’t know those exact numbers though and the ratio is probably much higher than my example numbers. Inference costs likely dominate.
That's an argument for why openai and anthropic shouldn't be profitable, but this point is about how also they don't have customers using the models to generate a profit either. Things like cursor, for example. ETA: also note the recent MIT study that found that 95% of LLM pilots at for-profit companies were not producing returns.
This article is about the model providers' costs, not API users'. Cursor etc have to pay the marked-up inference costs, so it's not surprising they can't make a profit.
A factory can make cheap goods and not reach profitability for some time due to the large capital outlay in spinning up a factory and tooling. It is likely there are large capital costs associated with model training that are recouped over the lifetime of the model.
You are making a joke but reasonably speaking there are a ton of software companies where they kept reinvesting where they should have taken out profit, especially when they are peaking.
Input inference i.e. reading is cheaper, output i.e. doing the generating is not, for something called generative AI sounds pretty fucking not profitable.
The cheap usecase from this article is not a trillion dollar industry and absolutely not the usecase hyped as the future by AI companies, that is coming for your job.
I've done the modeling on this a few times and I always get to a place where inference can run at 50%+ gross margins, depending mostly on GPU depreciation and how good the host is at optimizing utilization. The challenge for the margins is whether or not you consider model training costs as part of the calculation. If model training isn't capitalized + amortized, margins are great. If they are amortized and need to be considered... yikes
Why wouldn't you factor in training? It is not like you can train once and then have the model run for years. You need to constantly improve to keep up with the competition. The lifespan of a model is just a few months at this point.
I suspect we've already reached the point with models at the GPT5 tier where the average person will no longer recognize improvements and this model can be slightly improved at slow intervals and indeed run for years. Meanwhile research grade models will still need to be trained at massive cost to improve performance on relatively short time scales.
Whenever someone has complained to me about issues they are having with ChatGPT on a particular question or type of question, the first thing I do is ask them what model they are using. So far, no one has ever known offhand what model they were using, nor were not aware there are more models!
If you understand there are multiple models from multiple providers, some of those models are better at certain things than others, and how you can get those models to complete your tasks, you are in the top 1% (probably less) of LLM users.
As long as models continue on their current rapid improvement trajectory, retraining from scratch will be necessary to keep up with the competition. As you said, that's such a huge amount of continual CapEx that it's somewhat meaningless to consider AI companies' financial viability strictly in terms of inference costs, especially because more capable models will likely be much more expensive to train.
But at some point, model improvement will saturate (perhaps it already has). At that point, model architecture could be frozen, and the only purpose of additional training would be to bake new knowledge into existing models. It's unclear if this would require retraining the model from scratch, or simply fine-tuning existing pre-trained weights on a new training corpus. If the former, AI companies are dead in the water, barring a breakthrough in dramatically reducing training costs. If the latter, assuming the cost of fine-tuning is a fraction of the cost of training from scratch, the low cost of inference does indeed make a bullish case for these companies.
In the same way that every other startup tries to sweep R&D costs under the rug and say “yeah but the marginal unit economics have 50% gross margins, we’ll be a great business soon”.
It's possible they factor in training purely as an "R&D" cost and then can tax that development at a lower rate.
I agree that you could get to high margins, but I think the modeling holds only if you're an AI lab operating at scale with a setup tuned for your model(s). I think the most open study on this one is from the DeepSeek team: https://github.com/deepseek-ai/open-infra-index/blob/main/20...
For others, I think the picture is different. When we ran benchmarks on DeepSeek-R1 on 8x H200 SXM using vLLM, we got up to 12K total tok/s (concurrency 200, input:output ratio of 6:1). If you're spiking up 100-200K tok/s, you need a lot of GPUs for that. Then, the GPUs sit idle most of the time.
I'll read the blog post in more detail, but I don't think the following assumptions hold outside of AI labs.
* 100% utilization (no spikes, balanced usage between day/night or weekdays) * Input processing is free (~$0.001 per million tokens) * DeepSeek fits into H100 cards in a way that network isn't the bottleneck
I wonder how much capex risk there is in this model, depreciating the GPUs over 5 years is fine if you can guarantee utilization. Losing market share might be a death sentence for some of these firms as utilization falls.
> whether or not you consider model training costs as part of the calculation
Whether they flow through COGS/COR or elsewhere on the income statement, they've gotta be recognized. In which case, either you have low gross margins or low operating profit (low net income??). Right?
That said, I just can't conceive of a way that training costs are not hitting gross margins. Be it IFRS/GAAP etc., training is 1) directly attributable to the production of the service sold, 2) is not SG&A, financing, or abnormal cost, and thus 3) only makes sense to match to revenue.
can you share the model?
Does that include legal fights and potential payouts to artists and writers whose work was used without permission?
Can anyone explain why it's not allowed to compensate the creators of the data?
Of course not. Those usually wouldn't be considered "margin".
Another similar example is R&D and development by engineers aren't considered in margin either.
It's already questionable if anyone can make it profitable once you account for all the costs. Why do you think they try to squash the legal concerns so hard? If they move fast and stick their fingers in their ears, they can just steal whatever the want.
I have to disagree. The biggest cost is still energy consumption, water and maintenance. Not to mention, to keep up with the rivals in incredibly high tempo (so offering billions like Meta recently). Then the cost of hardware that is equal to Nvidia skyrocketing shares :) No one should dare to talk about profit yet. Now is time to grab the market, invest a lot and work hard, hopping for a future profit. The equation is still work on progress.
> The biggest cost is still energy consumption, water and maintenance.
Are you saying that the operating costs for inference exceed the costs of training?
The global cost of inference in both Openai and Anthropic it exceed training cost for sure. The reason is simple: the inference cost grows with requests not with datasets. My math simplified by AI says: Suppose training GPT-like model costs
= $ 10,000,000 C T
=$10,000,000.
Each query costs
= $ 0.002 C I
=$0.002.
Break-even:
> 10,000,000 0.002 = 5,000,000,000
inferences N> 0.002 10,000,000
=5,000,000,000inferences
So after 5 billion queries, inference costs surpass the training cost.
Openai claims it has 100 million users x queries = I let you judge.
No. But training an LLM is certainly very very expensive and a gamble every time you do it. I think of it a bit like a pharmaceutical company doing vaccine research…
Is that not baked into the h100 rental costs?
It is.
https://www.axios.com/2025/08/15/sam-altman-gpt5-launch-chat... quotes Sam Altman saying:
> Most of what we're building out at this point is the inference [...] We're profitable on inference. If we didn't pay for training, we'd be a very profitable company.
ICYMI, Amodei said the same in much greater detail:
"If you consider each model to be a company, the model that was trained in 2023 was profitable. You paid $100 million, and then it made $200 million of revenue. There's some cost to inference with the model, but let's just assume, in this cartoonish cartoon example, that even if you add those two up, you're kind of in a good state. So, if every model was a company, the model, in this example, is actually profitable.
What's going on is that at the same time as you're reaping the benefits from one company, you're founding another company that's much more expensive and requires much more upfront R&D investment. And so the way that it's going to shake out is this will keep going up until the numbers go very large and the models can't get larger, and then it'll be a large, very profitable business, or, at some point, the models will stop getting better, right? The march to AGI will be halted for some reason, and then perhaps it'll be some overhang. So, there'll be a one-time, 'Oh man, we spent a lot of money and we didn't get anything for it.' And then the business returns to whatever scale it was at."
https://cheekypint.substack.com/p/a-cheeky-pint-with-anthrop...
The "model as company" metaphor makes no sense. It should actually be models are products, like a shoe. Nike spends money developing a shoe, then building it, then they sell it, and ideally those R&D costs are made up in shoe sales. But you still have to run the whole company outside of that.
Also, in Nike's case, as they grow they get better at making more shoes for cheaper. LLM model providers tell us that every new model (shoe) costs multiples more than the last one to develop. If they make 2x revenue on training, like he's said, to be profitable they have to either double prices or double users every year, or stop making new models.
But new models to date have cost more than the previous ones to create, often by an order of magnitude, so the shoe metaphor falls apart.
A better metaphor would be oil and gas production, where existing oil and gas fields are either already finished (i.e. model is no longer SOTA -- no longer making a return on investment) or currently producing (SOTA inference -- making a return on investment). The key similarity with AI is new oil and gas fields are increasingly expensive to bring online because they are harder to make economical than the first ones we stumbled across bubbling up in the desert, and that's even with technological innovation. That is to say, the low hanging fruit is long gone.
> new models to date have cost more than the previous ones to create
This largely was the case in software in the '80s-'10s (when versions largely disappeared) and still is the case in hardware. iPhone 17 will certainly cost far more to develop than did iPhone 10 or 5. iPhone 5 cost far more than 3G, etc.
exactly: it’s like making shoes if you’re really bad at making shoes :)
If you're going to use shoes as the metaphor, a model would be more like a shoe factory. A shoe would be a LLM answer, i.e. inference. In which case it totally makes sense to consider each factory as an autonomous economic unit, like a company.
>Also, in Nike's case, as they grow they get better at making more shoes for cheaper.
This is clearly the case for models as well. Training and serving inference for GPT4 level models is probably > 100x cheaper than they used to be. Nike has been making Jordan 1's for 40+ years! OpenAI would be incredibly profitable if they could live off the profit from improved inference efficiency on a GPT4 level model!
>>This is clearly the case ... probably
>>OpenAI would be incredibly profitable if they could live off the profit from improved inference efficiency on a GPT4 level model!
If gpt4 was basically free money at this point it's real weird that their first instinct was to cut it off after gpt5
I think the idea here is that gpt-5-mini is the cheap gpt-4 quality model they want to serve and make money on.
It's model as a company because people are using the VC mentality, and also explaining competition.
Model as a product is the reality, but each model competes with previous models and is only successful if it's both more cost effective, and also more effective in general at its tasks. By the time you get to model Z, you'll never use model A for any task as the model lineage cannibalizes sales of itself.
I believe better analogy is CPU development on next process node.
each node is much more expensive to design for, but when you finally have it you basically print money.
and of course you always have to develop next more powerful and power efficient CPU to keep competitive
Analogies don't prove anything, but they're still useful for suggesting possibilities for thinking about a problem.
If you don't like "model as company," how about "model as making a movie?" Any given movie could be profitable or not. It's not necessarily the case that movie budgets always get bigger or that an increased budget is what you need to attract an audience.
Okay but noticeably he invents two numbers then pretends that a third number is irrelevant in order to claim that each model (which is not a company) is a profitable company.
You'd think maybe the CEO might be able to give a ball park on the profit made off that 2023 model.
ETA: "You paid $100 million... There's some cost to inference with the model, but let's just assume ... that even if you add those two up, you're kind of in a good state."
You see this right? He literally says that if you assume revenue exceeds costs then it's profitable. He doesn't actually say that it does though.
OpenAI and Anthropic have very different customer bases and usage profiles. I'd estimate a significantly higher percentage of Anthropic's tokens are paid by the customer than OpenAI's. The ChatGPT free tier is magnitudes more popular than Claude's free tier, and Anthropic in all likelihood does a higher percentage of API business versus consumer business than OpenAI does.
In other words, its possible this story is correct and true for Anthropic, but not true for OpenAI.
Good point, very possible that Altman is excluding free tier as a marketing cost even if it loses more than they make on paid customers. On the other hand they may be able to cut free tier costs a lot by having the model router send queries to gpt-5-mini where before they were going to 4o.
Free tier provides a lot of training material. Every time you correct ChatGPT on its mistakes you’re giving them knowledge that’s not in any book or website.
Thats a moat, albeit one that is slow to build.
That's interesting, though you have to imagine the data set is very low quality on average and distilling high quality training pairs out of it is very costly.
Also Amodei has an assumption that a 100m model will make 200m of revenue but a 1B model will make 2B of revenue. Does that really hold up? There's no phenomenon that prevents them from only making 200m of revenue off a $1B model.
> So, there'll be a one-time, 'Oh man, we spent a lot of money and we didn't get anything for it.'
GPT-4.5 has entered the chat..
>> If we didn't pay for training, we'd be a very profitable company.
> ICYMI, Amodei said the same
No. He says that even paying for training a model is profitable. It makes more revenue that it costs - all things considered. A much stronger claim.
I take them to be saying the same thing — the difference is that Altman is referring to the training of the next model happening now, while Amodei is referring to the training months ago of the model you're currently earning money back on through inference.
This sounds like fabs.
I don't see why the declining marginal returns can't be continuous.
Fantastic perspective.
Basically each new company puts competitive pressure on the previous company, and together they compress margins.
They are racing themselves to the bottom. I imagine they know this and bet on AGI primacy.
> I imagine they know this and bet on AGI primacy.
Just like Uber and Tesla are betting on self driving cars. I think it's been 10 years now ("any minute now").
Notably, Uber switched horses and now runs Waymos with no human drivers.
Copy laundering as a service is only profitable when you discount future settlements:
https://www.reuters.com/legal/government/anthropics-surprise...
Which is like saying, “If all we did is charge people money and didn’t have any COGS, we’d be a very profitable company.” That’s a truism of every business and therefore basically meaningless.
I can't imagine the hoops an accountant would have to go through to argue training cost is COGS. In the most obvious stick-figures-for-beginners interpretation, as in, "If I had to explain how a P&L statement works to an AI engineer", training is R&D cost and inference cost is COGS.
I wasn’t using COGS in a GAAP sense, but rather as a synonym for unspecified “costs.” My bad. I suppose you would classify training as development and ongoing datacenter and GPU costs as actual GAAP COGS. My point was, if all you focus on is revenue and ignore the costs of creating your business and keeping it running, it’s pretty easy for any business to be “profitable.”
It’s generally useful to consider unit economy separate from whole company. If your unit economy is negative thing are very bleak. If it’s positive, your chance are going up by a lot - scaling the business amortizes fixed (non-unit) costs, such as admin and R&D, and slightly improves unit margins as well.
However this does not work as well if your fixed (non-unit) cost is growing exponentially. You can’t get out of this unless your user base grows exponentially or the customer value (and price) per user grows exponentially.
I think this is what Altman is saying - this is an unusual situation: unit economy is positive but fixed costs are exploding faster than economy if scale can absorb it.
You can say it’s splitting hair, but insightful perspective often requires teasing things apart.
It’s splitting a hair, but a pretty important hair. Does anyone think that models won’t need continuous retraining? Does anyone think models won’t continue to try to scale? Personally, I think we’re reaching diminishing returns with scaling, which is probably good because we’ve basically run out of content to train on, and so perhaps that does stop or at least slow down drastically. But I don’t see a scenario where constant retraining isn’t the norm, even if the rough amount of content we’re using for it grows only slightly.
there's not a bright line there, though.
The Amodei quote in my other reply explains why this is wrong. The point is not to compare the training of the current model to inference on the current model. The thing that makes them lose so much money is that they are training the next model while making back their training cost on the current model. So it's not COGS at all.
Well, only if the one training model continued to function as a going business. Their amortization window for the training cost is 2 months or so. They can't just keep that up and collect $.
They have to build the next model, or else people will go to someone else.
Why two months? It was almost a year between Claude 3.5 and 4. (Not sure how much it costs to go from 3.5 to 3.7.)
Even being generous, and saying it's a year, most capital expenditures depreciate over a period of 5-7 years. To state the obvious, training one model a year is not a saving grace
I don't understand why the absolute time period matters — all that matters is that you get enough time making money on inference to make up for the cost of training.
Don't they need to accelerate that, though? Having a 1 year old model isn't really great, it's just tolerable.
I think this is debatable as more models become good enough for more tasks. Maybe a smaller proportion of tasks will require SOTA models. On the other hand, the set of tasks people want to use LLMs for will expand along with the capabilities of SOTA models.
So,if they stopped training they’d be profitable? Only in some incremental sense, ignoring all sunk costs.
So is OpenAI capable of not making a new model at some point? They've been training the next model continuously as long as they've existed AFAIK.
Our software house spends a lot on R&D sure, but we're still incredibly profitable all the same. If OpenAI is in a position where they effectively have to stop iterating the product to be profitable, I wouldn't call that a very good place to be when you're on the verge of having several hundred billion in debt.
I think at that point there is strong financial pressure to figure out how to continuously evolve models instead of changing new ones, for example by building models out of smaller modules that can be trained individually and swapped out. Jeff Dean and Noam Shazeer talked about that a bit in their interview with Dwarkesh: https://www.dwarkesh.com/p/jeff-dean-and-noam-shazeer
There’s still untapped value in deeper integrations. They might hit a jackpot of exponentially increasing value from network effects caused by tight integration with e.g. disjoint business processes.
We know that businesses with tight network effects can grow to about 2 trillion in valuation.
How would that look with at least 3 US companies, probably 2 Chinese ones and at least 1 European company developing state of the art LLMs?
Like a very over-served market, I think. I see perhaps three survivors long term, or at most one gorilla, two chimps, and perhaps a few very small niche-focused monkeys.
This can be technically true without being actually true.
IE OpenAI invests in Cursor/Windsurf/Startups that give away credits to users and make heavy use of inference API. Money flows back to OpenAI then OpenAI sends it back to those companies via credits/investment $.
It's even more circular in this case because nvidia is also funding companies that generate significant inference.
It'll be quite difficult to figure out whether it's actually profitable until the new investment dollars start to dry up.
It's even more circular, because Microsoft and Amazon also fund ChatGPT and Anthropic with Azure and AWS credits.
While this could be true, I don't think OpenAI is investing the $hundreds of millions-to-billions that would be required otherwise make it actually true.
OpenAI's fund is ~$250-300mm Nvidia reportedly invested $1b last year - still way less than Open AI revenue
There a journalist ed zittron
https://www.wheresyoured.at/
That is an openai skeptic. His research if correct says not only is openai unprofitable but it likely never will be. Can't be ,its various finance ratios make early uber, amazon ect look downright fiscally frugal.
He is not a tech person for what that means to you.
Zitron is not a serious analyst.
https://bsky.app/profile/davidcrespo.bsky.social/post/3lxale...
https://bsky.app/profile/davidcrespo.bsky.social/post/3lo22k...
https://bsky.app/profile/davidcrespo.bsky.social/post/3lwhhz...
https://bsky.app/profile/davidcrespo.bsky.social/post/3lv2dx...
Amazon was very frugal. If you look at Amazon losses for the first 10 years, they were all basically under 5% of revenue and many years were break even or slightly net positive.
Uber burnt through a lot of money and even now I'm not sure their lifetime revenue is positive (it's possible that since their foundation they've lost more money than they've made).
Exactly Zittrons point.
From the latest NYT Hard Fork podcast [1]. The hosts were invited to a dinner hosted by Sam, where Sam said "we're profitable if we remove training from the equation", they report he turned to Lightcap (COO) and asked "right?" and Lightcap gave an "eeekk we're close".
They aren't yet profitable even just on inference, and its possible Sam didn't know that until very recently.
[1] https://www.nytimes.com/2025/08/22/podcasts/is-this-an-ai-bu...
“We’re not profitable even if we discount training costs.”
and
“Inference revenue significantly exceeds inference costs.”
are not incompatible statements.
So maybe only the first part of Sam’s comment was correct.
Except these tech billionaires lie the most of the time. This is still the "grow at any cost" phase, so I don't even genuinely believe he has a confident understanding of how or at what point anything will be profitable. This just strikes me as the best answer he has at the moment.
That might be the case, but inference times have only gone up since GPT-3 (GPT-5 is regularly 20+ seconds for me).
And by GPT-5 you mean through their API? Directly through Azure OpenAI services? or are you talking about ChatGPT set to using GPT-5.
All of these alternatives means different things when you say it takes +20 seconds for a full response.
Sure, apologies. I mean ChatGPT UI
Will these companies ever stop training new models? What does it mean if we get there. Feels like they will have to constantly train and improve the models, not sure what that means either. What ncremental improvements can these models show?
Another question is - will it ever become less costly to train?
Let to see opinions from someone in the know
current way the models works is that they don't have memory, it's included in training (or has to be provided as context).
So to keep up with times the models have to be constantly trained.
One thing though is that right now it's not just incremental training, the whole thing gets updated - multiple parameters and how the model is trained is different.
This might not be the case in the future where the training could become more efficient and switch to incremental updates where you don't have to re-feed all the training data but only the new things.
I am simplifying here for brevity, but I think the gist is still there.
Updating the internal knowledge is not the primary motivator here, as you can easily, and more reliably (less hallucination), get that information at inference stage (through web search tool).
They're training new models because the (software) technology keeps improving, (proprietary) data sets keep improving (through a lot of manual labelling but also synthetic data generation), and in general researchers have better understanding of what's important when it comes to LLMs.
Huh.
I feel oddly skeptical about this article; I can't specifically argue the numbers, since I have no idea, but... there are some decent open source models; they're not state of the art, but if inference is this cheap then why aren't there multiple API providers offering models at dirt cheap prices?
The only cheap-ass providers I've seen only run tiny models. Where's my cheap deepseek-R1?
Surely if its this cheap, and we're talking massive margins according to this, I should be able to get a cheap / run my own 600B param model.
Am I missing something?
It seems that reality (ie. the absence of people actually doing things this cheap) is the biggest critic of this set of calculations.
> but if inference is this cheap then why aren't there multiple API providers offering models at dirt cheap prices
There are multiple API providers offering models at dirt cheap prices, enough so that there is at least one well-known API provider that is an aggreggator of other API providers that offers lots of models at $0.
> The only cheap-ass providers I've seen only run tiny models. Where's my cheap deepseek-R1?
https://openrouter.ai/deepseek/deepseek-r1-0528:free
How is this possible? I imagine someone is finding some value in the prompts themselves but this cant possibly be paying for itself.
Inference is just that cheap plus they hope that you'll start using the ones they charge for as you become more used to using AI in your workflow.
you can also run deepseek for free on a modestly sized laptop
At 4-bit quant, R1 takes 300+ gigs just for weights. You can certainly run smaller models into which R1 has been distilled on a modest laptop, but I don't see how you can run R1 itself on anything that wouldn't be considered extreme for a laptop in at least one dimension.
You're probably thinking of what ollama labels "deepseek" which is not in fact deepseek, but other models with some deepseek distilled into them.
> why aren't there multiple API providers offering models at dirt cheap prices?
There are. Basically every provider's R1 prices are cheaper than estimated by this article.
https://artificialanalysis.ai/models/deepseek-r1/providers
The cheapest provider in your link charges 460x more for input tokens than the article estimates.
> The cheapest provider in your link charges 460x more for input tokens than the article estimates.
The article estinates $0.003 per million input tokens, the cheapest on the list is $0.46 per million. The ratio is 120×, not 460×.
OTOH, all of the providers are far below the estimated $3.08 cost per million output tokens
There are 7 providers on that page which have higher output token price than $3.08. There is even 1 which has higher input token price than that. So that "all" is not true either.
I also have no idea on the numbers. But I do know that these same companies are pouring many billions of dollars into training models, paying very expensive staff, and building out infrastructure. These costs would need to be factored in to come up with the actual profit margins.
> I should be able to get a cheap / run my own 600B param model.
if the margins on hosted inference are 80%, then you need > 20% utilization of whatever you build for yourself for this to be less costly to you (on margin).
i self-host open weight models (please: deepseek et al aren't open _source_) on whatever $300 GPU i bought a few years ago, but if it outputs 2 tokens/sec then i'm waiting 10 minutes for most results. if i want results in 10s instead of 10m, i'll be paying $30000 instead. if i'm prompting it 100 times during the day, then it's idle 99% of the time.
coordinating a group buy for that $30000 GPU and sharing that across 100 people probably makes more sense than either arrangement in the previous paragraph. for now, that's a big component of what model providers, uh, provide.
Another giant problem with this article is we have no idea the optimizations used on their end. There are some widly complex optimizations these large AI companies use.
What I'm trying to say is that hosting your own model is in an entierly different leauge than the pros.
If we account for error in article implies higher cost I would argue it would return back to profit directly because how advanced optimization of infer3nce has become.
If actual model intelligence is not a moat (looking likely this is true) the real sauce of profitable AI companies is advanced optimizations across the entire stack.
Openai is NEVER going to release their specialized kernels, routing algos, quanitizations or model comilation methods. These are all really hard and really specific.
There are, I screenshotted DeepInfra in the article, but there are a lot more https://openrouter.ai/deepseek/deepseek-r1-0528
is that a quantized model or the full r1?
Imo the article is totally off the mark since it assumes users on average do not go over th 1M tokens per day.
Afaik openai doesn't enforce a daily quota even on the $20 plans unless the platform is under pressure.
Since I often consume 20M token per day, one can assume many would use far more than the 1M tokens assumed in the article's calculations.
Meanwhile, I don’t use ChatGPT at all on a median day. I use it in occasional bursts when researching something.
There's zero basis for assuming any of that. The most likely situation is a power law curve where the vast majority of users don't use it much at all and the top 10% of users account for 90% of the usage.
It is very likely that you are in the top 10% of users.
True. the article also has zero basis in its estimating the average usage from each tier's user base.
I somewhat doubt my usage is so close to the edge of the curve since I don't even pay for any plan. It could be that I'm very frugal with money and fat on consumption while most are more balanced, but 1M token per day in any case sounds slim for any user who pays for the service.
https://lambda.chat
Deepseek R1 for free.
* distilled R1 for free
I would not be surprised if the operating costs are modest
But these companies also have very expensive R&D development and large upfront costs.
https://openrouter.ai/deepseek/deepseek-chat-v3.1
They are dirt cheap. Same model architecture for the comparison: $0.30/M $1.00/M. Or even $0.20-$0.80 from another provider.
For sure an interesting calculation. Only one remark from someone with GPU metal experience:
> But compute becomes the bottleneck in certain scenarios. With long context sequences, attention computation scales quadratically with sequence length.
Even if the statement about quadratically scales is right, the bottleneck we are talking about is somewhere north by factor 1000. If 10k cores do only simple matrix operations each needs to have new data (up to 64k) available every 500 cycles (let's say). Getting these amount of data (without _any_ collision) means something like 100+GByte/s per core. Even 2+TByte/s on HBM means the bottleneck is the memory transfer rate, by something like 500 times. With collision, we talk about an additional factor like 5000 (last time I've done some tests with a 4090).
What do you mean by collision?
If multiple cores tries to get the same memory addresses, the MMU feeds only one core, the second one have to whait. Depends on the type of RAM, this will cost a lot of cycles.
GPU MMUs can handle multiple line in parallel. But not 10k cores at the same time. The HBM is not able to transfer 3.5TByte sequencial.
Why is that? It seems like multiple cores requesting the same address would be easier for the MMU to fetch for, not harder.
It’s not that the fetching is the problem, but serving the data to many cores at the same time from a single source.
I'm not familiar with GPU architecture, is there not a shared L2/L3 data cache from which this data would be shared?
This is not my domain, but I assume the MMUs acting like a switch and something like multicast is not available here. I‘ve tried to implement such on a FPGA and it was extremely cost intensiv.
I believe it's that the bus can only serve one chip at a time, so it has to actually be faster since sometimes one chip's data will have to wait for the data of another chip to finish first.
This whole article is built off using DeepSeek R1, which is a huge premise that I don't think is correct. DeepSeek is much more efficient and I don't think it's a valid way to estimate what OpenAI and Anthropic's costs are.
https://www.wheresyoured.at/deep-impact/
Basically, DeepSeek is _very_ efficient at inference, and that was the whole reason why it shook the industry when it was released.
DeepSeek inference efficiency comes from two things: MoE and MLA attention. OpenAI was rumored to use MoE around GPT4 moment, I.e loooong time ago.
Given Gemini efficiency with long context I would bet their attention is very efficient too.
GPT OSS uses fp4, which DeepSeek doesn’t use yet btw.
So no, big labs aren’t behind DeepSeek in efficiency. Not by much at least.
The reason it shook the market at least was because of the claim that its training cost was 5 million.
That' what the buzz focused on, strange as we don't actually know what it cost them. While inference optimization is a fact and is even more impactful since training costs benefit from economics of scale.
I don't think that's strange at all, it's a much more palatable narrative for the mass who doesn't know what inference and training is and who think having conversations=training
Also the fact that it cost 10% of what other models cost. Pretty much still does.
Why would you think that deepseek is more efficient than gpt-5/Claude 4 though? There's been enough time to integrate the lessons from deepseek.
Because to make GPT-5 or Claude better than previous models, you need to do more reasoning which burns a lot more tokens. So, your per-token costs may drop, but you may also need a lot more tokens.
GPT-5 can be configured extensively. Is there any point at which any configuration of GPT-5 that offers ~DeepSeek level performance is more expensive than DeepSeek per token?
Uhhh, I'm pretty sure DeepSeek shook the industry because of a 14x reduction in training cost, not inference cost.
We also don't know the per-token cost for OpenAI and Anthropic models, but I would be highly surprised if it was significantly more expensive than open models anyone can use and run themselves. It's not like they're also not investing in inference research.
DeepSeek was trained with distillation. Any accurate estimate of training costs should include the training costs of the model that it was distilling.
That makes the calculation nonsensical, because if you go there... you'd also have to include all energy used in producing the content the other model providers used. So now suddenly everyones devices on which they wrote comments on social media, pretty much all servers to have ever served a request to open AI/Google/anthropics bots etc pp
Seriously, that claim was always completely disingenuous
I don't think it's that nonsensical to realize that in order to have AI, you need generations of artists, journalists, scientists, and librarians to produce materials to learn from.
And when you're using an actual AI model to "train" (copy), it's not even a shred of nonsense to realize the prior model is a core component of the training.
Isn't training cost a function of inference cost? From what I gathered, they reduced both.
I remember seeing lots of videos at the time explaining the details, but basically it came down to the kind of hardware-aware programming that used to be very common. (Although they took it to the next level by using undocumented behavior to their advantage.)
They're typically somewhat related but the difference between training and inference can vary greatly so, i guess the answer is no.
they did reduce both though and mostly due to reduced precision
Because of the alleged reduction in training costs.
All reports by companies are alleged until verified by other, more trustworthy sources. I don't think it's especially notable that it's alleged because it's DeepSeek vs. the alleged numbers from other companies.
What are we meant to take away from the 8000 word Zitron post?
In any case, here is what Anthropic CEO Dario Amodei said about DeepSeek:
"DeepSeek produced a model close to the performance of US models 7-10 months older, for a good deal less cost (but not anywhere near the ratios people have suggested)"
"DeepSeek-V3 is not a unique breakthrough or something that fundamentally changes the economics of LLM’s; it’s an expected point on an ongoing cost reduction curve. What’s different this time is that the company that was first to demonstrate the expected cost reductions was Chinese."
https://www.darioamodei.com/post/on-deepseek-and-export-cont...
We certainly don't have to take his word for it, but the claim is that DeepSeek's models are not much more efficient to train or inference than closed models of comparable quality. Furthermore, both Amodei and Sam Altman have recently claimed that inference is profitable:
Amodei: "If you consider each model to be a company, the model that was trained in 2023 was profitable. You paid $100 million, and then it made $200 million of revenue. There's some cost to inference with the model, but let's just assume, in this cartoonish cartoon example, that even if you add those two up, you're kind of in a good state. So, if every model was a company, the model, in this example, is actually profitable.
What's going on is that at the same time as you're reaping the benefits from one company, you're founding another company that's much more expensive and requires much more upfront R&D investment. And so the way that it's going to shake out is this will keep going up until the numbers go very large and the models can't get larger, and then it'll be a large, very profitable business, or, at some point, the models will stop getting better, right? The march to AGI will be halted for some reason, and then perhaps it'll be some overhang. So, there'll be a one-time, 'Oh man, we spent a lot of money and we didn't get anything for it.' And then the business returns to whatever scale it was at."
https://cheekypint.substack.com/p/a-cheeky-pint-with-anthrop...
Altman: "If we didn’t pay for training, we’d be a very profitable company."
https://www.theverge.com/command-line-newsletter/759897/sam-...
In terms of sources, I would trust Zitron a lot more than Altman or Amodei. To be charitable, those CEOs are known for their hyperbole and for saying whatever is convenient in the moment, but they certainly aren't that careful about being precise or leaving out inconvenient details. Which is what a CEO should do, more or less, but, I wouldn't trust their word on most things.
I agree we should not take CEOs at their word, we have to think about whether what they're saying is more likely to be true than false given other things we know. But to trust Zitron on anything is ridiculous. He is not a source at all: he knows very little, does zero new reporting, and frequently contradicts himself in his frenzy to believe the bubble is about to pop any time now. A simple example: claiming both that "AI is very little of big tech revenue" and "Big tech has no other way to show growth other than AI hype". Both are very nearly direct quotes.
Grok 3.5: 400M training run DeepSeek R1: 5M training run Released around the same time, marginal performance difference.
I suspect that says more about Grok than anything else.
The "efficiency" meantioned in blog post you have linked is the price difference between Deepseek and o1, it doesn't mean that GPT-5 or other SOTA models are less efficient.
What a wrong take. Its not even MoE that was great in deepseek, its shared expert + grpo
These numbers are off.
> $20/month ChatGPT Pro user: Heavy daily usage but token-limited
ChatGPT Pro is $200/month and Sam Altman already admitted that OpenAI is losing money from Pro subscriptions in January 2025:
"insane thing: we are currently losing money on openai pro subscriptions!
people use it much more than we expected."
- Sam Altman, January 6, 2025
https://xcancel.com/sama/status/1876104315296968813
That doesn't seem compatible with what he stated more recently:
> We're profitable on inference. If we didn't pay for training, we'd be a very profitable company.
Source: https://www.axios.com/2025/08/15/sam-altman-gpt5-launch-chat...
His possible incentives and the fact OpenAI isn't a public company simply make it hard for us to gauge which of these statements is closer to the truth.
Does anybody really think in this current time that what a CEO says has anything to do with reality and not just with hyping up ala elon recipe
Specifically, a connected CEO in post-law America.
This sort of thing used to be called fraud, but there's zero chance of criminal prosecution.
Criminal persecution? This scheme has been perfected, like what do you want to persecute. Can you say with certainty that he means it's profitable overall? What if he means it's profitable right now today it is profitable, but not yesterday or in the last week. or what if he meant if you take the mean user its profitable? so much room for interpretation, that's why there is no risk for them
> That doesn't seem compatible with what he stated more recently:
Profitable on inference doesn't mean they aren't losing money on pro plans. What's not compatible?
The API requests are likely making more money.
Yes, API pricing is usage based, but ChatGPT Pro pricing is a flat rate for a time period.
The question is then whether SaaS companies paying for GPT API pricing are profitable if they charge their users a flat rate for a time period. If their users trigger inference too much, they would also lose money.
This can be true if you assume that there exists a high number of $20 subscribers who don't use the product that much, but $200 subscribers squeeze every last bit and then some more. The balance could be still positive, but if you look at the power users alone, they might cost more than they pay.
They might even have decided “hey, these power users are willing to try and tells us what LLMs are useful for, and are even willing to pay us for the opportunity!”
> If we didn't pay for training
it is comical that something like this was even uttered in the conversation. It really shows how disconnected the tech sector is from the real world.
Imagine Intel CEO saying "If we didn't have to pay for fabs, we'd be a very profitable company." Even in passing. He'd be ridiculed.
I'm not entirely sure the analogy is fair - Amazon for example was 'ridiculed' for being hugely unprofitable for the first decade, but had underlying profitability if you removed capex.
As a counterpoint, if OpenAI were actually profitable at this early stage that could be a bad financial decision - it might mean that they aren't investing enough in what is an incredibly fierce and capital-intensive market.
Also admitting it would make this business impossible if they had to respect copyright law, so the laws shall be adjusted so that it can be a business.
I just straight up don't trust him
Saying that is the equivalent of him saying "our product is really valuable! use it!"
That is my interpretation, that it's a marketing attempt. A form of "The value of our product is so good that it's losing us money. It's practically the Costco hotdog combo!".
There's the usual issue of a CEO "talking their book" but there's also the fact that Sam has a rich, documented history of lying. That was the central issue of his firing. "Empire of AI" has a detailed account of this. He would outright tell board member A that "board member B said X", based on his knowledge of the social dynamics of the board he assumed that A and B would never talk. But they eventually figured it out, it unraveled, and they confronted him in a group. Specifically, when they confronted him about telling Ilya Sutskever that Tasha McCauley said Helen Toner should step off the board, McCauley said "I never said that" and Altman was at a loss for words for a minute before finally mumbling "Well, I thought you could have said that. I don't know."
Doesn't he have an incentive to make it look like that, though? The way he phrased it, that they are losing money because people use it so much, makes it seem like Pro subscribers are some super power-users. As long as inference has a nonnegative, nonzero cost, then this case will lose money, so Sam isn't admitting that the business model is flawed or anything
https://news.ycombinator.com/item?id=45053741
> The most likely situation is a power law curve where the vast majority of users don't use it much at all and the top 10% of users account for 90% of the usage.
That'll be the Pro users. My wife uses her regular sub very lightly, most people will be like her...
That's interesting but it doesn't mean they're losing money on the $20/month users. The Pro plan selects for heavy-usage enthusiasts.
Anyone paying attention should have zero trust in what Sam Altman says.
What do you think his strategy is? He has to make money at some point.
I don’t buy the logic that he will “scam” his investors and run away at some point.
He makes money by convincing people to buy OpenAI stock.
If OpenAI goes down tomorrow, he will be just fine. His incentive is to sell the stock, not actually build and run a profitable business.
Look at Adam Neumann as an example of how to lose billions of investor dollars and still walk out of the ensuing crash with over a billion.
https://en.wikipedia.org/wiki/Adam_Neumann
His strategy is to sell OpenAI stock like it was Bitcoin in 2020, and if for some reason the market decides that maybe a company that loses large amounts of cash isn't actually a good investment... he'll be fine, he's had plenty of time to turn some of his stock into money :)
Why not build a profitable business like Zucc, Bill gates, Jensen, Sergey etc? These people are way richer much more powerful.
Altman doesn't have any stock. He's playing a game at a level people caught up on "capitalism bad" can't even conceptualize.
Trusting the man about costs would be even more misplaced than trusting an oil company's CEO about the environment.
Losing money on o1-pro. That makes sense and also why they axed that entire class of models.
Every o1-pro and o1-preview inference was a normal inference times how many replica paths they made.
Apologies, should be Plus. I'll update the article later.
This seems very very far off. From the latest reports, anthropic has a gross margin of 60%. It came out in their latest fundraising story. From that one The Information report, it estimated OpenAI's GM to be 50% including free users. These are gross margins so any amortization or model training cost would likely come after this.
Then, today almost every lab uses methods like speculative decoding and caching which reduce the cost and speed up things significantly.
The input numbers are far off. The assumption is 37B of active parameters. Sonnet 4 is supposedly a 100B-200B param model. Opus is about 2T params. Both of them (even if we assume MoE) wont have exactly these number of output params. Then there is a cost to hosting and activating params at inference time. (the article kind of assumes it would be the same constant 37B params).
Are you saying that you think Sonnet 4 has 100B-200B _active_ params? And that Opus has 2T active? What data are you basing these outlandish assumptions on?
Oh nothing official. There are people who estimate the sizes based on tok/s, cost, benchmarks etc. The one that most go on is https://lifearchitect.substack.com/p/the-memo-special-editio.... This guy estimated Claude 3 opus to be 2T param model (given the pricing + speed). Opus 4 is 1.2T param according to him (but then I dont understand why the price remained the same.). Sonnet is estimated by various people to be around 100B-200B params.
[1]: https://docs.google.com/spreadsheets/d/1kc262HZSMAWI6FVsh0zJ...
If you're using the api cost of the model to estimate it's size, then you can't use this size estimate to estimate the inference cost.
Not everyone uses MoE architectures. It's not outlandish at all...
Gross margins also don't tell the whole story, we don't know how much Azure and Amazon charge for the infrastructure and we have reasons to believe they are selling it at a massive discount (Microsoft definitely does that, as follows from their agreement with OpenAI). They get the model, OpenAI gets discounted infra.
A discounted Azure H100 will still be more than $2 per hour. Same goes for AWS. Trainium chips are new and not as effective (not saying they are bad) but still cost in the same range.
For inference, gross margins are exactly: (what companies charge per 1M tokens to the user) - (direct cost to produce that 1M tokens which is GPU costs).
I am implying that what OpenAI pays for GPU/hour is much less than $2, because of the discount. That's an assumption. It could be $1, $0.5, no?
It could still be burning money for Microsoft/Amazon
From https://www.theverge.com/command-line-newsletter/759897/sam-..., Sam Altman said:
> “If we didn’t pay for training, we’d be a very profitable company.”
Exactly. All of the claims that OpenAI is losing money on every request are wrong. OpenAI hasn’t even unlocked all of their possible revenue opportunities from the free tier such as ads (like Google search), affiliate links, and other services.
There’s also a lot of comments in this thread who want LLM companies to fail for different reasons, so they’re projecting that wish on to imagined unit economics.
I’m having flashbacks to all of the conversations about Uber and claims that it was going to collapse as soon as the investment money ran out. Then Uber gradually transitioned to profitability and the critics moved to using the same shtick on AI companies.
If they're profitable, why on earth are they seeking crazy amounts of investment month after month? It seems like they'll raise 10 billion one month, and then immediately turn around and raise another 10 billion a month or two after that. If it's for training, it seems like a waste of money since GPT-5 doesn't seem like it's that much of an improvement.
No, the argument is that Uber was going to lose money hand over fist until all of the alternatives were starved to death, then raise prices infinitely.
Taxis sucked. Any disruptor who was willing to just... Tell people what the cost would be ahead of time without scamming them, and show up when they said they would, was going to win.
Uber (and Lyft) didn't starve the alternatives: they were already severely malnourished. Also, they found a loophole to get around the medallion system in several cities, which taxi owners used in an incredibly anticompetitive fashion to prevent new competition.
Just because Uber used a shitty business practice to deliver the killing blow doesn't mean their competition were undeserving of the loss, or that the traditional taxis weren't without a lot of shady practices.
Spoiler alert: in most of the world taxis are still there and at best Uber is just another app you can use to call them.
And lifetime profits for Uber are still at best break even which means that unless you timed the market perfectly, Uber probably lost you money as a shareholder.
Uber is just distorted in valuation by its presence in big US metro areas (which basically have no realistic transportation alternative).
So inference is cheap but training is expensive and getting more expensive. It seems like if they can't get training expenses down, cheap inference won't matter.
No. Training itself isn't that expensive compared to inference. The real expense is salary for talent.
Because Sam Altman said so?
Sam Altman also said this:
https://xcancel.com/sama/status/1876104315296968813
There is a six month gap between those statements. Inference costs have been plummeting, plans have had tweaked quotas, and usage patterns can change.
As someone who has been taking the largest part of Google and facebooks ad wallet share away, Let me tell you something.
Advertising is now a very very locked in market and will take over a decade to shift even a significant minority it into OpenAIs hands. This is not likely the first or even second monetization strategy imo.
But I’m happy to be wrong.
> As someone who has been taking the largest part of Google and facebooks ad wallet share away
Can you elaborate? You’ve sparked my curiosity.
There are two companies gaining significant wallet share: Amazon and TikTok. Of those only one is taking a significant early share of both Google and Facebook.
OK, but you are a person, not a company. "You" are not taking the share away.
"I'm digging a trench"
"No you're not, WE are digging a trench!"
Yes fine, but "I am as well".
Sheesh. Also I, personally, do and lead the work of taking the wallet share. So I will stick with "I" and would accept any of my team saying the same.
Well, at least your attitude has made it obvious who you work for now. ;)
If we ignore the fact that if training was free, everyone would do it and OpenAI wouldn't be profitable.
He also said he got scared when trying out GPT 5, thinking “What have we done?”.
He’s in the habit of lying, so it would be remiss to take his word for it.
Yeah Dario has said similar things in interviews. The way he explained it, if you look at each specific model (such as Sonnet 3.5) as its own separate company, then each one of them is profitable in the end. They all eventually recoup the expense of training, thanks to good profit margins on usage once they are deployed.
that's true of any company. if they didn't pay for building the product, they would be very profitable.
Yeah I've seen the same sentiment from a few others as well. Inference likely is profitable. Training is incredibly expensive and will sometimes not yield positive results.
Or if they had to pay copyright costs. So much pirated data being repackaged and sold.
It’s not being repackaged. That question has already been settled by at least two courts.
It's wild and, while they're all guilty, Gemini is a particularly egregious offender. What really surprises me is that they don't even consider it a bug if you can predictably get it to generate copyrighted content. These types of exploits are out of scope of their bug bounty program and they suggest the end user file a ticket whenever they encounter such issues (i.e. they're just saying YOLO until there's case law).
Since DeepSeek R1 is open weight, wouldn't it be better to validate the napkin math to validate how many realistic LLM full inferences can be done on a single H100 in a time period, and calculate the token cost of that?
Without having in depth knowledge of the industry, the margin difference between input and output tokens is very odd to me between your napkin math and the R1 prices. That's very important as any reasoning model explodes reasoning tokens, which means you'll encounter a lot more output tokens for fewer input tokens, and that's going to heavily cut into the high margin ("essentially free") input token cost profit.
Unless I'm reading the article wrong.
I am so glad someone else called this out, I was reading the napkin math portions and struggling to see how the numbers really worked out and I think you hit the nail on the head. The author is assuming 'essentially free' input token cost and extrapolating in a business model that doesn't seem to connect directly to any claimed 'usefulness'. I think the bias on this is stated in the beginning of the article clearly as the author assumes 'given how useful the current models are...'. That is not a very scientific starting point and I think it leads to reasoning errors within the business model he posits here.
There were some oddities with the numbers themselves as well but I think it was all within rounding, though it would have been nice for the author to spell it out when he rounded some important numbers (~s don't tell me a whole lot).
TL;DR I totally agree, there are some napkin math issues going on here that make this pretty hard to see as a very useful stress test of cost.
This kind of presumes you're just cranking out inference non-stop 24/7 to get the estimated price, right? Or am I misreading this?
In reality, presumably they have to support fast inference even during peak usage times, but then the hardware is still sitting around off of peak times. I guess they can power them off, but that's a significant difference from paying $2/hr for an all-in IaaS provider.
I'm also not sure we should expect their costs to just be "in-line with, or cheaper than" what various hourly H100 providers charge. Those providers presumably don't have to run entire datacenters filled to the gills with these specialized GPUs. It may be a lot more expensive to do that than to run a handful of them spread among the same datacenter with your other workloads.
Yes. But these are on demand prices, so you could just turn them off when loads are less.
But there is no way that OpenAI should be more expensive than this. The main cost is the capex of the H100s, and if you are buying 100k at a time you should be getting a significant discount off list price.
Of course it is impossible for us to know the true cost, but idle instances should not be accounted for at full price:
1. Idle instances don't turn electricity to heat so that reduces their operating cost.
2. Idle instances can be borrowed for training which means flexible training amortizes peak inference capacity.
That's why they have the batch tier: https://platform.openai.com/docs/guides/batch
> In reality, presumably they have to support fast inference even during peak usage times, but then the hardware is still sitting around off of peak times. I guess they can power them off, but that's a significant difference from paying $2/hr for an all-in IaaS provider.
They can repurpose those nodes for training when they aren't being used for inference. Or if they're using public cloud nodes, just turn them off.
These articles (of which there are many) all make the same basic accounting mistakes. You have to include all the costs associated with the model, not just inference compute.
This article is like saying an apartment complex isn’t “losing money” because the monthly rents cover operating costs but ignoring the cost of the building. Most real estate developments go bust because the developers can’t pay the mortgage payment, not because they’re negative on operating costs.
If the cash flow was truly healthy these companies wouldn’t need to raise money. If you have healthy positive cash flow you have much better mechanisms available to fund capital investment other than selling shares at increasingly inflated valuations. Eg issue a bond against that healthy cash flow.
Fact remains when all costs are considered these companies are losing money and so long as the lifespan of a model is limited it’s going to stay ugly. Using that apartment building analogy it’s like having to knock down and rebuild the building every 6 months to stay relevant, but saying all is well because the rents cover the cost of garbage collection and the water bill. That’s simply not a viable business model.
Update Edit: A lot of commentary below re the R&D and training costs and if it’s fair to exclude that on inference costs or “unit economics.” I’d simply say inference is just selling compute and that should be high margin, which the article concludes it is. The issue behind the growing concerns about a giant AI bubble is if that margin is sufficient to cover the costs of everything else. I’d also say that excluding the cost of the model from “unit economics” calculations doesn’t make business/math/economics since it’s literally the thing being sold. It’s not some bit of fungible equipment or long term capital expense when they become obsolete after a few months. Take away the model and you’re just selling compute so it’s really not a great metric to use to say these companies are OK.
> Fact remains when all costs are considered these companies are losing money
You would need to figure out what exactly they are losing money on. Making money on inference is like operating profit - revenue less marginal costs. So the article is trying to answer if this operating profit is positive or negative. Not whether they are profitable as a whole.
If things like cost of maintaining data centres or electricity or bandwidth push them into the red, then yes, they are losing money on inference.
If the things that make them lose money is new R&D then that's different. You could split them up into a profitable inference company and a loss making startup. Except the startup isn't purely financed by VC etc, but also by a profitable inference company.
Yes that's right. The inference costs in isolation are interesting because that speaks to the unit economics of this business: R&D / model training aside, can the service itself be scaled to operate at a profit? Because that's the only hope of all the R&D eventually paying dividends.
One thing that makes me suspect inference costs are coming down is how chatty the models have become lately, often appending encouragement to a checklist like "You can check off each item as you complete them!" Maybe I'm wrong, but I feel if inference was killing them, the responses would become more terse rather than more verbose.
For the top few providers, the training is getting amortized over absurd amount of inference. E.g. Google recently mentioned that they processed 980T tokens over all surfaces in June 2025.
The leaked OpenAI financial projections for 2024 showed about equal amount of money spent on training and inference.
Amortizing the training per-query really doesn't meaningfully change the unit economics.
> Fact remains when all costs are considered these companies are losing money and so long as the lifespan of a model is limited it’s going to stay ugly. Using that apartment building analogy it’s like having to knock down and rebuild the building every 6 months to stay relevant. That’s simply not a viable business model.
To the extent they're losing money, it's because they're giving free service with no monetizaton to a billion users. But since the unit costs are so low, monetizing those free users with ads will be very lucrative the moment they decide to do so.
Assuming users accept those ads. Like, would they make it clear with a "sponsored section", or would they just try to worm it into the output? I could see a lot of potential ways that users reject the ad service, especially if it's seen to compromise the utility or correctness of the output.
(Author here). Yes I am aware of that and did mention it. However - what I wanted to push back in this article was that claude code was completely unsustainable and therefore a flash in the pan and devs aren't at risk (I know you are not saying this).
The models as is are still hugely useful, even if no further training was done.
> The models as is are still hugely useful, even if no further training was done.
Exactly. The parent comment has an incorrect understanding of what unit economics means.
The cost of training is not a factor in the marginal cost of each inference or each new customer.
It’s unfortunate this comment thread is the highest upvoted right now when it’s based on a basic misunderstanding of unit economics.
The marginal cost is not the salient factor when the model has to be frequently retrained at great cost. Even if the marginal cost was driven to zero, would they profit?
But they don't have to be retained frequently at great cost. Right now they are retrained frequently because everyone is frequently coming out with new models and nobody wants to fall behind. But if investment for AI were to dry up everyone would stop throwing so much money at R&D, and if everyone else isn't investing in new models you don't have to either. The models are powerful as they are, most of the knowledge in them isn't going to rapidly obsolete, and where that is a concern you can paper over it with RAG or MCP servers. If everyone runs out of money for R&D at the same time we could easily cut back to a situation where we get an updated version of the same model every 3 years instead of a bigger/better model twice a year.
And whether companies can survive in that scenario depends almost entirely on their unit economics of inference, ignoring current R&D costs
Like we've seen with Karparthy & Murati starting their own labs, it's to be expected that over the next 5 years, hundreds of engineers & researchers at the bleeding edge will quit and start competing products. They'll reliably raise $1b to $5b in weeks, too. And it's logical: for an investor, a startup founded by a Tier 1 researcher will more reliably 10-100x your capital, vs. Anthropic & OpenAI that are already at >$250b+.
This talent diffusion guarantees that OpenAI and Anthropic will have to keep sinking in ever more money to stay at the bleeding edge, or upstarts like DeepSeek and incumbents like Meta will simply outspend you/hire away all the Tier 1 talent to upstage you.
The only companies that'll reliably print money off AI are TSMC and NVIDIA because they'll get paid either way. They're selling shovels and even if the gold rush ends up being a bust, they'll still do very well.
True. But at some point the fact that there are many many players in the market will start to diminish the valuation of each of those players, don’t you think? I wonder what that point would be.
> But if investment for AI were to dry up everyone would stop throwing so much money at R&D, and if everyone else isn't investing in new models you don't have to either
IF.
If you do stagnate for years someone will eventually decide to invest and beat you. Intel has proven so.
Yeah so? How does that change anything?
Unit economics are the salient factor of inference costs, which this article is about.
I upvoted it because it aligns most closely with my own perspective. I have a strong dislike for AI and everything associated with it, so my judgment is shaped by that bias. If a post sounds realistic or complex, I have no interest in examining its nuance. I am not concerned with practical reality and prefer to accept it without thinking, so I support ideas that match my personal viewpoint.
I don’t understand why people like you have to call this stuff out? Like most of HN thinks the way I do and that’s why the post was upvoted. Why be a contrarian? There’s really no point.
Is this written by a sarcastic AI?
> claude code was completely unsustainable and therefore a flash in the pan and devs aren't at risk
How can you possibly say this if you know anything about the evolution of costs in the past year?
Inference costs are going down constantly, and as models get better they make less mistakes which means less cycles = less inference to actually subsidize.
This is without even looking at potential fundamental improvements in LLMs and AI in general. And with all the trillions in funding going into this sector, you can't possibly think we're anywhere near the technological peak.
Speaking as a founder managing multiple companies: Claude Code's value is in the thousands per month /per person/ (with the proper training). This isn't a flash in the pan, this isn't even a "prediction" - the game HAS changed and anyone telling you it hasn't is trying to cover their head with highly volatile sand.
I totally agree with you! I have heard others saying this though. But I don't think it's true.
Got it — I got confused by your wording in the post but it’s clear now.
I think the point isn't to argue AI companies are money printers or even that they're fairly valued, it's that at least the unit economics work out. Contrast this to something like moviepass, where they were actually losing money on each subscriber. Sure, a company that requires huge capital investments that might never be paid back isn't great either, but at least it's better than moviepass.
Unit economics needs to include the cost of the thing being sold, not just the direct cost of selling it.
Unit economics is mostly a manufacturing concept and the only reason it looks OK here is because of not really factoring in the cost of building the thing into the cost of the thing.
Someone might say I don’t understand “unit economics” but I’d simply argue applying a unit economics argument saying it’s good without including the cost of model training is abusing the concept of unit economics in a way that’s not realistic from a business/economics sense.
The model is what’s being sold. You can’t just sell “inference” as a thing with no model. Thats just selling compute, which should be high margin. The article is simply affirming that by saying yes when you’re just selling compute in micro-chunks that’s a decent margin business which is a nice analysis but not surprising.
The cost of “manufacturing” an AI response is the inference cost, which this article covers.
> That would be like saying the unit economics of selling software is good because the only cost is some bandwidth and credit card processing fees. You need to include the cost of making the software
Unit economics is about the incremental value and costs of each additional customer.
You do not amortize the cost of software into the unit economics calculations. You only include the incremental costs of additional customers.
> just like you need to include the cost of making the models.
The cost of making the models is important overall, but it’s not included in the unit economics or when calculating the cost of inference.
That isn't what unit economics is. The purpose of unit economics is to answer: "How much money do I make (or lose) if I add one more customer or transaction?". Since adding an additional user/transaction doesn't increase the cost of training the models you would not include the cost of training the models in a unit economics analysis. The entire point of unit economics is that it excludes such "fixed costs".
The thing about large fixed costs is that you can just solve them with growth. If they were losing money on inference alone no amount of growth would help. It's not clear to me there's enough growth that everybody makes it out of this AI boom alive, but at least some companies are going to be able to grow their way to profitability at some point, presumably.
There is no marginal cost for training, just like there's no marginal cost for software. This is why you don't generally use unit economics for analyzing software company breakeven.
The only reason unit economics aren't generally used for software companies is the profit margin is typically 80%+. The cost of posting a Tweet on Twitter/X is close to $0.
Compare the cost of tweeting to the cost of submitting a question to ChatGPT. The fact that ChatGPT rate limits (and now sells additional credits to keep using it after you hit the limit) indicates there are serious unit economic considerations.
We can't think of OpenAI/Anthropic as software businesses. At least from a financial perspective, it's more similar to a company selling compute (e.g. AWS) than a company selling software (e.g. Twitter/X).
You can amortise the training cost across billions of inference requests though. It's the marginal cost for inference that's most interesting here.
But what about running Deepseek R1 or (insert other open weights model here)? There is no training cost for that.
1. Someone is still paying for that cost.
2. “Open source” is great but then it’s just a commodity. It would be very hard to build a sustainable business purely on the back of commoditized models. Adding a feature to an actual product that does something else though? Sure.
There is plenty of money to be made from hosting open source software. AWS for instance makes tons of money from Linux, MySQL, Postgres, Redis, hosting AI models like DeepSeek (Bedrock) etc.
> You have to include all the costs associated with the model, not just inference.
The title of the article directly says “on inference”. It’s not a mistake to exclude training costs. This is about incremental costs of inference.
Hacker News commenters just can't help but critique things even when they're missing the point
The parent commenter’s responses are all based on a wrong understanding of what unit economics means.
You don’t include fixed costs in the unit economics. Unit economics is about incremental costs.
I know I'm agreeing with you. I'm saying, don't bother with him lol
Your comment may apply to the original commenter “missing” the point of TFA and to the person replying “missing” the point of that comment. And to my comment “missing” the point of yours - which may have also “missed” the point.
I’ve clearly “missed” the point you were trying to make, because there’s nothing complicated: The article is about unit economics and marginal costs of inferences and this comment thread is trying to criticize the article based on a misunderstanding of what unit economics means.
I was not trying to make any point. I’m not even sure if the comment I replied to was suggesting that it was you or the other commenter who was missing some point or another.
It’s fun to work backwards, but i was listening to a podcast where the journalists were talking about a dinner that Sam Altman had.
This question came up and Sam said they were profitable if you exclude training and the COO corrected him
So at least for OpenAI, the answer is “no”
They did say it was close
And that’s if you exclude training costs which is kind of absurd because it’s not like you can stop training
Worth noting that the post only claims they should be profitable for the inference of their paying customers on a guesstimated typical workload. Free users and users with atypical usage patterns will obviously skew the whole picture. So the argument in the post is at least compatible with them still losing money on inference overall.
Excluding training two of their biggest costs will be payroll and inferencing for all the free users.
It’s therefore interesting that they claimed it was close: this supports the theory inferencing from paid users is a (big) money maker if it’s close to covering all the free usage and their payroll costs?
There’s no mention of that in this article about it:
https://archive.is/wZslL
They quote him as saying inference is profitable and end it at that.
Are you saying that the COO corrected him at the dinner, or on the podcast? Which podcast was it?
From a journalist at the dinner:
“I think that tends to end poorly because as demand for your service grows, you lose more and more money. Sam Altman actually addressed this at dinner. He was asked basically, are you guys losing money every time someone uses ChatGPT?
And it was funny. At first, he answered, no, we would be profitable if not for training new models. Essentially, if you take away all the stuff, all the money we're spending on building new models and just look at the cost of serving the existing models, we are sort of profitable on that basis.
And then he looked at Brad Lightcap, who is the COO, and he sort of said, right? And Brad kind of like squirmed in his seat a little bit and was like, well, we're pretty close.
We're pretty close. We're pretty close.
So to me, that suggests that there is still some, maybe small negative unit economics on the usage of ChatGPT. Now, I don't know whether that's true for other AI companies, but I think at some point, you do have to fix that because as we've seen for companies like Uber, like MoviePass, like all these other sort of classic examples of companies that were artificially subsidizing the cost of the thing that they were providing to consumers, that is not a recipe for long-term success.”
From Hard Fork: Is This an A.I. Bubble? + Meta’s Missing Morals + TikTok Shock Slop, Aug 22, 2025
GPT-5 was I suppose their attempt to make a product that provides as good metrics as their earlier products.
Uber doesn't really compare, as they had existing competition from taxi companies that they first had to/have to destroy. And cars or fuel didn't get 10x cheaper over the time of Uber's existence, but I'm sure that they still can optimize a lot for efficiency.
I'm more worried about OpenAIs capability to build a good moat. Right now it seems that each success is replicated by the competing companies quickly. Each month there is a new leader in the benchmarks. Maybe the moat will be the data in the end, i.e. there is barriers nowadays to crawl many websites that have lots of text. Meanwhile they might make agreements with the established AI players, maybe some of those agreements will be exclusive. Not just for training but also for updating wrt world news.
Thanks!
It’s funny you mention apartments, because that is exactly the comparison i thought of, but with the opposite conclusion. If you buy an apartment with debt, but get positive cash flow from rent, you wouldn’t call that unprofitable or a bad investment. It takes X years to recoup the initial debt, and as long as X is achievable that’s a good deal.
Hoping for something net profitable including fixed costs from day 1 is a nice fantasy, but that’s not how any business works or even how consumers think about debt. Restaurants get SBA financing. Homeowners are “net losing money” for 30 years if you include their debt, but they rightly understand that you need to pay a large fixed cost to get positive cash flow.
R&D is conceptually very similar. Customer acquisition also behaves that way
Running with your analogy having positive cash flow and buying a property to hold for the long term makes sense. Thats the classic mortgage scenario. But it takes time for that math to work out. Buying a new property every 6 months breaks that model. That’s like folks that keep buying a new car and rolling “negative equity” into a new deal. It’s insanity financially but folks still do it.
I don't think it's an accounting error when the article title says "Are OpenAI and Anthropic Really Losing Money on Inference?"
And it's a relevant question because people constantly say these companies are losing money on inference.
I think the nuance here is what people consider the “cost” of “inference.” Purely on compute costs and not accounting for the cost of the model (which is where the article focuses) it’s not bad.
I found Dario’s explanation pretty compelling:
https://x.com/FinHubIQ/status/1960540489876410404
the short of it: if you do the accounting on a per-model basis, it looks much better
That was worth a watch, thank you!
Their assumption is that training is a fixed cost: you'll spend the same amount on training for 5 users as you will with 500 million users.
Spending hundreds of millions of dollars on training when you are two guys in a garage is quite significant, but the same amount is absolutely trivial if you are planet-scale.
The big question is: how will training cost develop? Best-case scenario is a one-and-done run. But we're now seeing an arms race between the various AI providers: worst-case scenario, can the market survive an exponential increase in training costs for sublinear improvements?
They just won’t train it. They have the choice.
Why do you think they will mindlessly train extremely complicated models if the numbers don’t make sense?
My observation is that Opus is chronically capacity constrained while being dramatically more expensive than any of the others.
To me that more or less settles both "which one is best" and "is it subsidized".
Can't be sure, but anything else defies economic gravity.
Or Opus is a great model so demand is high and the provider isn't scaling the platform. I agree something defies gravity.
Also that's not accounting for free riders.
I have probably consumed trillions of free tokens from openai infra since gpt 3 and never spent a penny.
And now I'm doing the equivalent on Gemini since flash is free of charge and a better model than most free of charge models.
I think this is missing the point that the very interesting article makes.
You're arguing that maybe the big companies won't recoup their investment in the models, or profitably train new ones.
But that's a separate question. Whether a model - which now exists! - can profitably be run is very good to know. The fact that people happily pay more than the inference costs means what we have now is sustainable. Maybe Anthropic of OpenAI will go out of business or something, but the weights have been calculated already, so someone will be able to offer that service going forward.
It hasn't even proven that, it's assuming a ridiculous daily usage, and also ignoring free riders. Running a model is likely not profitable for any provider right now. Even a public company (e.g alphabet) isn't obliged to honest figures since numbers on the sheets can be moved left and right. We won't know for a other year or two when companies we have today start falling and their founders start talking.
What will be the knock on effect on us consumers?
Self hosting LLMs isn’t completely out of the realm of feasibility. Hardware cost may be 2-3x a hardcore gaming rig but it would be neat to see open source, self hosted, coding helpers. When Linux hit the scenes it put UNIX(ish) power in the hands of anyone with no license fee required. Surely somewhere someone is doing the same with LLM assisted coding.
The only reason to have a local model right now is for privacy and hobby.
The economics are awful and local model performance is pretty lackluster by comparison. Never mind much slower and narrower context length.
$6,000 is 2.5 years of a $200/mo subscription. And in 2.5 years that $6k setup will likely be equivalent to a $1k setup of the time.
We don't even need to compare it to the most expensive subscriptions.
The $20 subscription is far more capable than anything i could build locally for under $10k.
Costs will go up to levels where people will no longer find this stuff as useful/interesting. It’s all fun and games until the subsides end.
See the recent reactions to AWS pricing on Kiro where folks had a big WTF reaction on pricing after, it appears, AWS tried to charge realistic pricing based on what this stuff actually costs.
Isn’t AWS always quite expensive? Look at their margins and the amount of cash it throws off, versus the consumer/retail business which runs a ton more revenue but no profit.
If you’re applying the same pricing structure to Kiro as to all AWS products then, yeah, it’s not particularly hobbyist accessible?
The article is answering a specific question, and has excluded this on purpose. If you have a sunk training cost you still want to know if you can at least operate profitably.
API prices are going up and rate limits are getting more aggressive (see what's going on with cursor and claude code)
"This article is like saying an apartment complex isn’t “losing money” because the monthly rents cover operating costs but ignoring the cost of the building. Most real estate developments go bust because the developers can’t pay the mortgage payment, not because they’re negative on operating costs."
Exactly the analogy I was going to make. :)
> If the cash flow was truly healthy these companies wouldn’t need to raise money.
If this were true, the stock market would have no reason to exist.
> if you have healthy positive cash flow you have much better mechanisms available to fund capital investment other than selling shares. Eg issue a bond against that healthy cash flow.
Is that actually true in 2025? Presumably you have to make coupon payments on a bond(?), but shares are free. Companies like Meta have shown you can issue shares that don't come with voting rights and people will buy them, and meme stocks like GME have demonstrated the effectiveness of churning out as many shares as the market will bear.
Agree it’s not the fashionable thing. There’s a line from The Big Short of “This is Wall Street Dr Bury, if you offer us free money we’re going to take it.”
These companies are behaving the same way. Folks are willing to throw endless money into the present pit so on the one hand I can’t blame them for taking it.
Reality is though that when the hype wears off it’s only throwing more gasoline on the fire and building a bigger pool of investors that’s will become increasingly desperate to salvage returns. History says time and time again that story doesn’t end well and that’s why the voices mumbling “bubble” under their breath are getting louder every day.
The model is like a house. It can be upgraded. And it can be sold.
Think of the model as an investment.
> Think of the model as an investment.
Exactly, or a factory.
Ok, one issue I have with this analysis is the breakdown between input and output tokens. I'm the kind of person who spend most of my chat asking questions, so I might only use 20ish input tokens per prompt, where Gemini is having to put out several hundred, which would seem to affect the economics quite a bit
Yeah, I've noticed Chatgpt5 is very chatty. I can ask a 1 sentence question and get back 3-4 paragraphs, most of which I ignore, depending upon the task.
Same. It acts like its output tokens are for free. My input output ratio is like 1 to 10 at least. Not counting "Thought" and it's internal generation for agentic tasks.
It also didn't take into account a lot of the new models are reasoning models which spits out a lot of output tokens.
It may hurt them financially but they are fighting for market share and I'd argue short answers will drive users away. I prefer the long ones much more as they often include things I haven't directly asked about but are still helpful.
Everyone claiming AI companies are a financial ticking time bomb are using the same logic people used back in the 2000s when they claimed Amazon “never made a profit” and thus was a bad investment.
Basically- the same math as modern automated manufacturing. Super expensive and complex build-out - then a money printer once running and optimized.
I know there is lots of bearish sentiments here. Lots of people correctly point out that this is not the same math as FAANG products - then they make the jump that it must be bad.
But - my guess is these companies end up with margins better than Tesla (modern manufacturer), but less than 80%-90% of "pure" software. Somewhere in the middle, which is still pretty good.
Also - once the Nvidia monopoly gets broken, the initial build out becomes a lot cheaper as well.
The difference is the money printer right now only prints for ~6 months before it needs to be replaced with an even more expensive printer.
And if you ever stop/step off the treadmill and jack up prices to reach profitability, a new upstart without your sunk costs will immediately create a 99% solution and start competing with you. Or more like hundreds of competitors. Like we've seen with Karpathy & Murati, any engineer with pedigree working on the frontline models can easily raise billions to compete with them.
Expect the trend to pick up as the pool of engineers who can create usable LLMs from scratch increases through knowledge/talent diffusion.
The LLM scene is an insane economic bloodbath right now. The tech aside, the financial moves here are historical. It's the ultimate wet dream for consumers - many competitors, face-ripping cap-ex, any missteps being quickly punished, and a total inability to hold back anything from the market. Companies are spending hundreds of billions to put the best tech in your hands as fast and as cheaply as possible.
If OpenAI didn't come along with ChatGPT, we would probably just now be getting Google Bard 1.0 with an ability level of GPT-3.5 and censorship so heavy it would make it useless for anything beyond "Tell me who the first president was".
the difference is you can train on outputs deepseek style, there are not gates in this field profit margins will go to 0
Only introducing this *NVIDIA AI Released Jet-Nemotron: 53x Faster Hybrid-Architecture Language Model Series that Translates to a 98% Cost Reduction for Inference at Scale, into the conversation as it just dropped (so it is timely) and while it seems unlikely either OpenAI or Anthropic use this or a technique like it (yet or if they even can), these types of breakthroughs may introduce dramatic savings for both closed and open source inference at scale moving forward https://www.marktechpost.com/2025/08/26/nvidia-ai-released-j...
As the author seems to admit, an outsider is going to lack so much information (costs, loss leaders, etc), one has to assume any modeling is so inaccurate that it's not worth anything.
So the question remains unanswered, at least for us. For those putting money in, you can be absolutely certain they have a model with sufficient data to answer the question. Since money did go in, even if it's venture, the answer is probably "yes in the immediate, but no over time."
The math on the input tokens is definitely wrong. It claims each instance (8 GPUs) can handle 1.44 million tokens/sec of input. Let's check that out.
1.44e6 tokens/sec * 37e9 bytes/token / 3.3e12 bytes/sec/GPU = ~16,000 GPUs
And that's assuming a more likely 1 byte per parameter.
So the article is only off by a factor of at least 1,000. I didn't check any of the rest of the math, but that probably has some impact on their conclusions...
37 billion bytes per token?
Edit: Oh assuming this is an estimate based on the model weights moving fromm HBM to SRAM, that's not how transformers are applied to input tokens. You only have to do move the weights for every token during generation, not during "prefill". (And actually during generation you can use speculative decoding to do better than this roofline anyways).
There's also an estimation of how much a KV cache grows with each subsequent token. That would be roughly ~MBs/token. I think that would be the bottleneck
> (And actually during generation you can use speculative decoding to do better than this roofline anyways).
And more importantly batches, so taking the example from the blog post, it would be 32 tokens per each forward pass in the decoding phase.
> 37e9 bytes/token
This doesn't quite sound right...isn't a token just a few characters?
Your calculations make no sense. Why are you loading the model for each token independently? You can process all the input tokens at the same time as long as they can fit in memory.
You are doing the calculation as they were output tokens on a single batch, it would not make sense even in the decode phase.
This. ChatGPT also agrees with you: "74 GB weight read is per pass, not per token." I was checking the math in this blog post with GPT to understand it better and it seems legit for the given assumptions.
Then the right calculation is to use FLOPs not bandwidth like they did.
Well he asked some AI to do the math for him probably
The estimation for output token is too low since one reasoning-enabled response can burn through thousands of output tokens. Also low for input tokens since in actual use there're many context (memory, agents.md, rules, etc) included nowadays.
When you are operating at scale you are likely to use a small model during the auto regressive phase to generate sequential tokens and only involve the large model once you've generated several tokens. Whenever the two predict the same output you effectively generate more than one token at a time. The idea is the models will agree often enough to significantly reduce output token costs. Does anyone know how effective that is in practice?
A full KV-cache is quite big compared to the weights of the model (depending on the context size), that should be a factor too (and basically you need to maintain a separate KV cache for each request, I think...). Also the the token/s is not uniform across the request and it's getting slower with each subsequent generated token.
On the other side, there's an insane booster of speculative decoding, that would give a semi-prefill rate for decoding, but the memory pressure is still a factor.
I would be happy to be corrected regarding both factors.
An interesting exercise would be what prompts would create the most costs for LLMs but outwards little to no costs for the issuers. Are all prompts equal and the only factor being the lengths of the input and output prompts? Or is there processing of the prompts that could be exceedingly expensive for the LLM?
Yes, the API business is cross-subsidizing the consumer business.
Back in March, I did the same analysis with greater sensitivities, and arrived at similar gross margins: >70%.
https://johnnyclee.com/i/are-frontier-labs-making-80percent-...
Not wishing to do a shallow dismissal here, but I always assumed AI must be profitable on inference otherwise no one would pursue it as a business given how expensive the training is.
It seems sort of like wondering if a fiber ISP is profitable per GB bandwidth. Of course it is; the expensive part is getting the fiber to all the homes. So the operations must be profitable or there is simply no business model possible.
AI right now seems more like a religious movement than a business one. It doesn't matter how much it costs (to the true believers), its about getting to AGI first.
This is a great article, but it doesn't appear to model H100 downtime in the $2/hr costs. It assumes that OpenAI and Anthropic can match demand for inference to their supply of H100s perfectly, 24/7, in all regions. Maybe you could argue that the idle H100s are being used for model training - but that's different to the article's argument that inference is economically sustainable in isolation.
Not really, that is why they sell Batch API at considerably lower costs than the normal API.
There are also probably all kinds of enterprise deals that they are okay with high latency (> hours) that they do beyond the PAYG batch APIs
Consider some of the scaling properties of frontier cloud LLMs:
1) routing: traffic can be routed to smaller, specialized, or quantized models
2) GPU throughput vs latency: both parameters can be tuned and adjusted based on demand. What seems like lots of deep "thinking" might just be trickling the inference over less GPU resources for longer.
3) caching
Good breakdown of the costs involved. Even if they're running at a loss, OpenAI and Anthropic receive considerable value from the free training data users are providing through their conversations. Looking at it another way, these companies are paying for the training data to make their models better for future profitability.
I don't believe the asymmetry between prefill and decode is that large. If it were, it would make no sense for most of the providers to have separate pricing for prefill with cache hits vs. without.
Given the analysis is based on R1, Deepseek's actual in-production numbers seem highly relevant: https://github.com/deepseek-ai/open-infra-index/blob/main/20...
(But yes, they claim 80% margins on the compute in that article.)
> When established players emphasize massive costs and technical complexity, it discourages competition and investment in alternatives
But it's not the established players emphasizing the costs! They're typically saying that inference is profitable. Instead the false claims about high costs and unprofitability are part of the anti-AI crowd's standard talking points.
Yes. I was really surprised at this myself (author here). If you have some better numbers I'm all ears. Even on my lowly 9070XT I get 20x the tok/s input vs output, and I'm not doing batching or anything locally.
I think the cache hit vs miss stuff makes sense at >100k tokens where you start getting compute bound.
I linked to the writeup by Deepseek with their actual numbers from production, and you want "better numbers" than that?!
> Each H800 node delivers an average throughput of ~73.7k tokens/s input (including cache hits) during prefilling or ~14.8k tokens/s output during decoding.
That's a 5x difference, not 1000x. It also lines up with their pricing, as one would expect.
(The decode throughputs they give are roughly equal to yours, but you're claiming a prefill performance 200x times higher than they can achieve.)
A good rule of thumb is that a prefill token is about 1/6th the compute cost of decode token, and that you can get about 15k prefill tokens a second on Llama3 8B on a single H100. Bigger models will require more compute per token, and quantization like FP8 or FP4 will require less.
Maybe because you aren’t doing batching? It sounds like you’re assuming that would benefit prefill more than decode, but I believe it’s the other way around.
Model context limits are not “artificial” as claimed.
The largest context window a model can offer at a given quality level depends on the context size the model was pretrained with as well as specific fine tuning techniques.
It’s not simply a matter of considering increased costs.
Context extension methods exist and work. Please educate yourself about these rather than confidentially saying wrong things.
This kinda tracks with the latest estimate of power usage of llm inference published by google https://news.ycombinator.com/item?id=44972808. If inference isnt that power hungry like people thought, they must be able to make good money from those subscriptions.
> power hungry like people thought
The only people who thought this were non-practitioners.
"Heavy readers - applications that consume massive amounts of context but generate minimal output - operate in an almost free tier for compute costs."
Not saying there's not interesting analysis here, but this is assuming that they don't have to pay for access to the massive amounts of context. Sources like stackoverflow and reddit that used to be free, are not going to be available to keep the model up to date.
If this analysis is meant to say "they're not going to turn the lights out because of the costs of running", that may be so, but if they cannot afford to keep training new models every so often they will become less relevant over timte, and I don't know if they will get an ocean of VC money to do it all again (at higher cost than last time, because the sources want their cut now).
I thought the thing that made DeepSeek interesting (besides competition from China) was that its inference costs were something like 1/10th. So unless that gap has been bridged (has it?) I don't think a calculation based on DeepSeek can apply to OpenAI or Anthropic.
Idk what is going on but I'm using it all day for free, no limits in sight yet... It's just for small things, but for sure I would have had to pay 6 months ago. I actually would if they prompted tbh. Although I still find that whole "You can't use the webUI with your API credits" annoying. Why not? Why make me run OpenWebUI or LibreChat?
I guess my use is absolutely nothing compare to someone with a couple of agents running continuously.
"Here's the key insight: each forward pass processes ALL tokens in ALL sequences simultaneously."
This sounds incorrect, you only process all tokens once, and later incrementally. It's an auto-regressive model after all.
Not during prefill, i.e. the very first token generated in a new conversation. During this forward pass, all tokens in the context are all processed at the same time, and then attention's KV are cached, you still generate a single token, but you need to compute attention from all tokens to all tokens.
From that point on every subsequent tokens is processed sequentially in autoregressive way, but because we have the KV cache, this becomes O(N) (1 token query to all tokens) and not O(N^2)
I somehow missed the "decode phase" paragraph and hence was confused - it's essentially that separation I meant, you're obviously correct.
The API prices of $3/$15 are not right for a lot of models. see at openrouter, the gpt-oss-120b ones https://openrouter.ai/openai/gpt-oss-120b, it's more like $0.01/$0.3 (and that model actually needs h200/b200 to have good throughput).
Some agent startups are already feeling the squeeze — The Information reported Cursor’s gross margins hit –16% due to token costs. So even if inference is profitable for OAI/Anthropic, downstream token-hungry apps may not see the same unit economics, and that is why token-intensive agent startups like Cursor and Perplexity are taking open-source models like Qwen or other OSS-120B and post-training them to bring down inference costs.
I wouldn't be surprised if their profit/query is at a negative for all major Ai companies, but guess what?
They have a service which understands a users question/needs 100x better than a traditional Google search does.
Once they tap into that for PPC/paid ads, their profit/query should jump into the green. In fact, there's a decent chance a lot of these models will go 100% free once that PPC pipeline is implemented and shown to be profitable.
> Once they tap into that for PPC/paid ads,
If they start showing ads based on your prompts, and your history of "chats", it will erode the already shaky trust that users have in the bots. "Hallucinations" are one thing, but now you'll be asking yourself all the time: is that the best answer the llm can give me, or has it been trained to respond in ways favourable to its advertisers?
This is the exact same issues Facebook/YouTube/etc had with ads. In the end, ads won.
Google used to segregate ads very clearly in the beginning. Now they look almost the same as results. I've switched to DDG since then, but have the majority of users? Nope. Even if they're not using ad blockers, most people seem to not mind the ads.
With LLMs, the ads will be even more harder to tell apart from non-ads.
> They have a service which understands a users question/needs 100x better than a traditional Google search does.
Source?
A lifetime of using Google and 4 years of using LLMs.
…is a great counter-example of a “source”.
It’s not like the product at-hand is relevant to data analysis or anything, amirite?
Sometimes a statement is just too obvious to need extensive sourcing, and this is one of those times.
Gemini doesn’t always find very much better results, but it usually does. It beggars belief to claim that it doesn’t also understand the query much better than Rankbrain et al.
Those same H100's are probably also going to be responsible for the R&D of new model versions. Running a model is definitely cheaper than training them.
Can you really rent a cluster of 75 H100s for 75*2 USD per hour? Individual H100s, yes. But with sufficient interconnect to run these huge models?
Message to Martin if you are reading this - a blog without an RSS feed is not a blog. Please add one :)
I wonder if there needs to be two different business models:
1. Companies that train models and license them
2. Companies that do inference on models
With the heat turning up on AI companies to explain how they will land on a viable business model some of this is starting to look like WeWork’s “Community Adjusted EBITA” arguments of “hey if you ignore where we’re losing money, we’re not losing money!” that they made right before imploding.
I think most folks understand that pure inference in a vacuum is likely cash flow positive, but that’s not why folks are asking increasingly tough questions on the financial health of these enterprises.
A fast growing venture backed startup doing frontier R&D should be losing money overall.
If they weren’t losing money, they wouldn’t be spending enough on R&D. This isn’t some gotcha. It’s what the investors want right now.
Don’t disagree it’s what investors want. Point is just that we’re approaching a point from an economics standpoint where the credibility of the “it’s ok because we’re investing in R&D” argument is rapidly wearing thin.
WeWork’s investors didn’t want them to focus on business fundamentals either and kept pumping money elsewhere. That didn’t turn out so well.
The only reason they wouldn’t be losing money on inference is if more costly (more computationally intensive) inference wouldn’t be able to give them an extra edge, which seems unlikely to me.
See also: https://x.com/tanayj/status/1960116730786918616
OpenAI projects 50% gross margins for 2025
The other companies don't include free users in their GM calculations which makes it hard to compare
So, if this is true, OpenAI needs much better conversion rates, because they have ~15 million paying users compared to 800 million weekly active users:
https://nerdynav.com/chatgpt-statistics/
Yeah but they can probably monetize them with ads.
I'm not so sure. Inserting ads into chatbot output is like inserting ads into email. People are more reluctant to tolerate that than web or YouTube ads (which are hated already).
If they insert stealth ads, then after the third sponsored bad restaurant suggestion people will stop using that feature, too.
Mmm let's see. I think in LLM ads are probably have the most intent (and therefore most value) of any ads. They are like search PPC ads on steroids as you have even more context of what the user is actually looking for.
Hell they could even just add affiliate tracking to links (and not change any of the ranking based on it) and probably make enough money to cover a lot of the inference for free users.
LLM generated ads.
Another comment mentioned the cost associated with the model. Setting that aside, wouldn't we also need to include all of the systems around the inference? I can imagine significant infrastructure and engineering needs around all of these various services, along with the work needed to keep these systems up and running.
Or are these costs just insignificant compared to inference?
All incremental costs should be included. If adding each 100,000 new customers requires 1 extra engineer you would include that. We don’t know those exact numbers though and the ratio is probably much higher than my example numbers. Inference costs likely dominate.
Hasn't Sam Altman already said they are profitable on inference, minus training costs?
Yes they are, they are deeply deeply unprofitable and that's why they need endless investments to prop them up.
That's why Microsoft is not doing the deal with OpenAI, that's why Claude was fiddling with token limits just a couple of weeks ago.
It's a huge bubble, and the only winner at this moment is Nvidia.
Citation? Investments aren't evidence of unprofitability in inference
If inference is that cheap, why is not even one company profitable yet?
Because they're spending it all on training the next model.
That's an argument for why openai and anthropic shouldn't be profitable, but this point is about how also they don't have customers using the models to generate a profit either. Things like cursor, for example. ETA: also note the recent MIT study that found that 95% of LLM pilots at for-profit companies were not producing returns.
This article is about the model providers' costs, not API users'. Cursor etc have to pay the marked-up inference costs, so it's not surprising they can't make a profit.
A factory can make cheap goods and not reach profitability for some time due to the large capital outlay in spinning up a factory and tooling. It is likely there are large capital costs associated with model training that are recouped over the lifetime of the model.
Training.
"Why would you reinvest profits back into a business that is extremely profitable, when you have the chance of pulling your money out?"
You are making a joke but reasonably speaking there are a ton of software companies where they kept reinvesting where they should have taken out profit, especially when they are peaking.
Input inference i.e. reading is cheaper, output i.e. doing the generating is not, for something called generative AI sounds pretty fucking not profitable.
The cheap usecase from this article is not a trillion dollar industry and absolutely not the usecase hyped as the future by AI companies, that is coming for your job.