I feel like what happens is first they release a giant model. Then they start optimizing the model to increase inference speed and reduce costs.
But then they introduce bugs leading to lots of complaints. Usually the complaints are about models being dumbed down but almost always the labs say they are doing no such thing.
So I am thinking if we believe the labs then they have a very error prone optimization effort going on.
I have attributed this experience to the same thing that happens when you look closely to AI-generated images. On the first glance they look great but closer inspection reveals flaws.
Perhaps one just learns to see through the model after a while.
So, I feel these cutting-edge top-tier cloud-based large models aren't reliable at all, they keep getting adjusted or downgraded. Might as well just install Qwen 3.6 locally on my own computer.
What I've noticed for the past couple of days is a spike in resource consumption for trivial tasks like examining a commit or listing a directory; it goes around in circles with absurd actions, it doesn't need to diff the files in a directory to search for a single file.
same thing happened to me. was on codex for about a month, felt that exact shift, ended up going back to claude max. not sure if it's routing or tuning but something changed and prompting around it didn't help.
Yes, I feel so. I started happening from june first weekish.
I have shifted to claude code for planning things and codex for execution (as it is faster, though dumb)
Most likely reasoning being truncated. Refer to this:
https://github.com/openai/codex/issues/30364
https://www.reddit.com/r/codex/comments/1ugyvez/half_of_your...
I feel like what happens is first they release a giant model. Then they start optimizing the model to increase inference speed and reduce costs.
But then they introduce bugs leading to lots of complaints. Usually the complaints are about models being dumbed down but almost always the labs say they are doing no such thing.
So I am thinking if we believe the labs then they have a very error prone optimization effort going on.
I have attributed this experience to the same thing that happens when you look closely to AI-generated images. On the first glance they look great but closer inspection reveals flaws.
Perhaps one just learns to see through the model after a while.
So, I feel these cutting-edge top-tier cloud-based large models aren't reliable at all, they keep getting adjusted or downgraded. Might as well just install Qwen 3.6 locally on my own computer.
What I've noticed for the past couple of days is a spike in resource consumption for trivial tasks like examining a commit or listing a directory; it goes around in circles with absurd actions, it doesn't need to diff the files in a directory to search for a single file.
same thing happened to me. was on codex for about a month, felt that exact shift, ended up going back to claude max. not sure if it's routing or tuning but something changed and prompting around it didn't help.
Yes, I feel so. I started happening from june first weekish. I have shifted to claude code for planning things and codex for execution (as it is faster, though dumb)