These pricing analysis posts always focus on price per token and never on a rigorous price per query comparison instead.
We went from a non-thinking model to an “always thinking” model. METR shows GPT 5 uses pretty much the entire token budget. You get charged for hidden output tokens. Lots of api users will not be sufficiently limiting the token budget nor realizing their needs are met with minimal thinking (or have tested and minimal thinking performance is not good enough). So the price you see isn’t necessarily indicative of the price you pay, you can’t compare price with previous non-thinking models, and you can’t compare price with other thinking models from other providers since the slopes of the thinking time vs relative improvement to output quality (or even correctness) and the token budget/max thinking time vs actual hidden token emitted even for simple queries that don’t need the full budget differ significantly between providers (and, to a lesser extent, between models even in the same generation)
These pricing analysis posts always focus on price per token and never on a rigorous price per query comparison instead.
We went from a non-thinking model to an “always thinking” model. METR shows GPT 5 uses pretty much the entire token budget. You get charged for hidden output tokens. Lots of api users will not be sufficiently limiting the token budget nor realizing their needs are met with minimal thinking (or have tested and minimal thinking performance is not good enough). So the price you see isn’t necessarily indicative of the price you pay, you can’t compare price with previous non-thinking models, and you can’t compare price with other thinking models from other providers since the slopes of the thinking time vs relative improvement to output quality (or even correctness) and the token budget/max thinking time vs actual hidden token emitted even for simple queries that don’t need the full budget differ significantly between providers (and, to a lesser extent, between models even in the same generation)
Not to mention how many times it gets it wrong and you have to redo. If gpt 5 makes you do many things twice…