It's the bitter-lesson to feature-engineering lifecycle.
When a technique or technology is new people are making massive gains by just applying it to some use case, or gathering more data for training, or giving it more resources.
As time goes on those "bitter lesson" gains start to hit the shallow part of the logistic curve and companies have to start investing more and more effort into engineering for each small, incremental gain.
Why didn't this author compare Llama 3 with GLM 5.2 (released 1 week ago) which is a more standard attention based LLM? To compare 2 separate families of LLMs and then pointing out that they are different is not a surprising result and detracts from the point the author is trying to make.
If you look at it, the diagrams are very similar, but the main differences are that the feedforward is replaced with a MoE (router to multiple feedforwards) and the model has a different attention implementation.
The author is correct, the model architecture is now much more complicated. You can see this if you use llama.cpp and follow the project. The earlier models were always fully implemented. Yet with more contributors, as of today tons of latest models only have partial implementation. DeepSeekv3.2 isn't fully implemented, same with KimiK2.6, GLM5.2+, DeepSeekv4 has no implementation, MiniMaxM3 not supported yet, Hy3-preview no implementation. The latest models are just bare bones to run with lots of support missing for the advanced features.
I think the point stands: MoE, a myriad of complex attention approaches, shared layers, you name it. And making it all work together well is a huge trial-and-error pain even for small models, never mind getting to efficient hardware utilization.
The source is the same in the original article too. He is using a different diagram from the same site on the right to justify his point on how much more complicated things have become.
I am _very_ familiar with Claudish, and to some extent, the other AIs' writing styles. This article is human-written and features human writing quirks.
The very first sentence
> Back in 2022 and 2023 there were two big branches of machine learning happening at Meta.
is unmistakably human. That's not how a LLM would phrase this sentence, and if it did, it would have put a comma after 2023.
I am a professional writer and have been for over 30 years. (I do not use any form of LLM ever.) This means I read a lot. This also means that I have 30+ years of experience of readers not understanding what I wrote, or not getting further than the title, or not getting the main message, or inverting it in their heads, or inserting their own message and then complaining when I diverge, and an endless list of Ways People Do Not Get It.
I am also a trained TESOL teacher. Ability to capture gist is a skill we test for and measure, and many, maybe the majority, of native speakers don't have it and don't know.
In recent years I constantly see people going "this is written by AI" and I have yet to see a single of of them able to coherently prove their point. It's all just feelings and hunches.
So I am calling you on this:
How do you know? Show your working. Demonstrate your case.
Claude's writing style is at least as distinctive as any human's personal style. It has a long list of favorite words, verbal tics and common structures. On top of that, LLM writing is often bad in a very particular way: it's weak on actual things to say, but with an overheated style.
Some days, I spend over 4 hours a day reading walls of text written by Claude. If I couldn't recognize Claude's default "voice" by now, something would be wrong. It would be like a Hemingway fan not being able to recognize Hemingway. Except more so, because Claude's writing style is getting worse from version to version, descending into self parody.
On the statistical side, Pangram's model identifies AI-authored text with a 1-in-5,000 false positive rate, measured against hold-out texts from before 2022. My "ear" also agrees closely with Pangram. If I think something sounds AI written, Pangram virtually always comes back with "AI, confidence: high."
Claude's default voice, yes. But I'd assume a lot of people have learned to prompt it to something other than its default style. IMO it is good practice to have a style guide to feed in with the prompt.
> On top of that, LLM writing is often bad in a very particular way: it's weak on actual things to say, but with an overheated style.
This point is interesting because it raises the question of what "LLM writing" actually is. If it is expanding a smaller prompt into a larger article then yes, by construction the information density is low. But it can also be used to take a semi-coherent stream of consciousness and turn it into something readable and the people using it that way might already have started to slip under the radar.
This is a lot like how the criminals seem especially stupid because the ones who get caught are disproportionately the stupid ones. The easily detectable LLM writers are going to be the lazy ones.
One thing I think that helps is that for anything more worthwhile than message board posts like this I use Claude to review my text, make suggestions, and iterate on structure with me. But I'm the one writing the bulk of the text. I'll take some of its suggestions verbatim, but only if I genuinely like it better than anything I came up with myself.
The end product is something much more polished than anything I'd writ eon my own, but still comes off as being genuinely from me. At least that's what people have told me when I've asked.
I often run my writing through scans for AI tells. The number of things it flags that are just my own personal vocal tics, that I've had for 40+ years, is amazing.
You need to start using LLMs a lot and then you will know how we know.
Edit: You know how you can recognise someone just from their gait while they walk towards you? I would struggle to describe that for an individual person but it doesn't mean I can't identify them from that alone.
But AI written pieces do have a certain feeling. A sort of saccatto in the succession of ideas that does not feel natural. They emphasize certain points, and you as a reader, you just wonder why is that. There is the “This thing, not just that thing”. There are also the three successive propositions (mostly in one sentences) to accentuate an idea and “Negation. Strong positive idea in the same direction”.
In general try reading one (vocally) to yourself and it will feel really weird.
Just like em-dashes, some people have always done these though. Why are they penalized with immediate AI slop witch hunts? The LLMs didn't come up with these tics out of thin air.
You have to start with the reason there’s a witch hunt. I love reading. I read books (novels) almost every day and I’m almost always perusing a textbook pr an article for my jobs and my hobbies. The signal/noise for entertainment or information was fairly high, then come LLM tools.
You start to be interested by the title of an article or a book cover, and then you start reading it and it’s just vapor. Nothing tangible to be gained. It’s like buying something expensive and finding out a cheap trinket under the wrapping.
After a couple of times, you will develop a certain kind of heuristics for this kind of texts. It will not be perfect and will have some false positives, but that’s the only way to keep your sanity.
>The LLMs didn't come up with these tics out of thin air.
LLMs were trained with a lot of synthetic data to transform them from a complete this text into a chatbot, I suspect that this tons of synthetic data that forces the LLM to answer questions into a specific ways also forced them to have this "synthetic/robotic" language. Claude users would have noticed the "belter and suspenders" phrase just started popping out after an update and I am sure is nto because lots of developers used it in their blogs and Anthropic scrapped those blogs in that update.
It's the bitter-lesson to feature-engineering lifecycle.
When a technique or technology is new people are making massive gains by just applying it to some use case, or gathering more data for training, or giving it more resources.
As time goes on those "bitter lesson" gains start to hit the shallow part of the logistic curve and companies have to start investing more and more effort into engineering for each small, incremental gain.
Why didn't this author compare Llama 3 with GLM 5.2 (released 1 week ago) which is a more standard attention based LLM? To compare 2 separate families of LLMs and then pointing out that they are different is not a surprising result and detracts from the point the author is trying to make.
https://sebastianraschka.com/llm-architecture-gallery/?compa...
If you look at it, the diagrams are very similar, but the main differences are that the feedforward is replaced with a MoE (router to multiple feedforwards) and the model has a different attention implementation.
The author is correct, the model architecture is now much more complicated. You can see this if you use llama.cpp and follow the project. The earlier models were always fully implemented. Yet with more contributors, as of today tons of latest models only have partial implementation. DeepSeekv3.2 isn't fully implemented, same with KimiK2.6, GLM5.2+, DeepSeekv4 has no implementation, MiniMaxM3 not supported yet, Hy3-preview no implementation. The latest models are just bare bones to run with lots of support missing for the advanced features.
Yeah, not a great apples-to-apples comparison.
I think the point stands: MoE, a myriad of complex attention approaches, shared layers, you name it. And making it all work together well is a huge trial-and-error pain even for small models, never mind getting to efficient hardware utilization.
> If you look at it, the diagrams are very similar,
The page links to the same site you do. No wonder it is similar -- the source is the same!
The source is the same in the original article too. He is using a different diagram from the same site on the right to justify his point on how much more complicated things have become.
It’s written by AI.
I am _very_ familiar with Claudish, and to some extent, the other AIs' writing styles. This article is human-written and features human writing quirks.
The very first sentence
> Back in 2022 and 2023 there were two big branches of machine learning happening at Meta.
is unmistakably human. That's not how a LLM would phrase this sentence, and if it did, it would have put a comma after 2023.
[[citation needed]]
I am a professional writer and have been for over 30 years. (I do not use any form of LLM ever.) This means I read a lot. This also means that I have 30+ years of experience of readers not understanding what I wrote, or not getting further than the title, or not getting the main message, or inverting it in their heads, or inserting their own message and then complaining when I diverge, and an endless list of Ways People Do Not Get It.
I am also a trained TESOL teacher. Ability to capture gist is a skill we test for and measure, and many, maybe the majority, of native speakers don't have it and don't know.
In recent years I constantly see people going "this is written by AI" and I have yet to see a single of of them able to coherently prove their point. It's all just feelings and hunches.
So I am calling you on this:
How do you know? Show your working. Demonstrate your case.
Claude's writing style is at least as distinctive as any human's personal style. It has a long list of favorite words, verbal tics and common structures. On top of that, LLM writing is often bad in a very particular way: it's weak on actual things to say, but with an overheated style.
Some days, I spend over 4 hours a day reading walls of text written by Claude. If I couldn't recognize Claude's default "voice" by now, something would be wrong. It would be like a Hemingway fan not being able to recognize Hemingway. Except more so, because Claude's writing style is getting worse from version to version, descending into self parody.
On the statistical side, Pangram's model identifies AI-authored text with a 1-in-5,000 false positive rate, measured against hold-out texts from before 2022. My "ear" also agrees closely with Pangram. If I think something sounds AI written, Pangram virtually always comes back with "AI, confidence: high."
Claude's default voice, yes. But I'd assume a lot of people have learned to prompt it to something other than its default style. IMO it is good practice to have a style guide to feed in with the prompt.
> On top of that, LLM writing is often bad in a very particular way: it's weak on actual things to say, but with an overheated style.
This point is interesting because it raises the question of what "LLM writing" actually is. If it is expanding a smaller prompt into a larger article then yes, by construction the information density is low. But it can also be used to take a semi-coherent stream of consciousness and turn it into something readable and the people using it that way might already have started to slip under the radar.
This is a lot like how the criminals seem especially stupid because the ones who get caught are disproportionately the stupid ones. The easily detectable LLM writers are going to be the lazy ones.
One thing I think that helps is that for anything more worthwhile than message board posts like this I use Claude to review my text, make suggestions, and iterate on structure with me. But I'm the one writing the bulk of the text. I'll take some of its suggestions verbatim, but only if I genuinely like it better than anything I came up with myself.
The end product is something much more polished than anything I'd writ eon my own, but still comes off as being genuinely from me. At least that's what people have told me when I've asked.
I suspect people who think they are getting away with this are far more obvious than they realize.
That reply was AI written.
Lol, no. I've always sounded like that, and there are decades of my writing online.
Also, FWIW, Pangram scores my writing as entirely human.
Claude's writing isn't easy to identify because it uses em-dashes and bulleted lists. Claude's distinctive style goes much deeper than that.
I think it was a joke
I often run my writing through scans for AI tells. The number of things it flags that are just my own personal vocal tics, that I've had for 40+ years, is amazing.
In other words, correlation != causation
You need to start using LLMs a lot and then you will know how we know.
Edit: You know how you can recognise someone just from their gait while they walk towards you? I would struggle to describe that for an individual person but it doesn't mean I can't identify them from that alone.
I don’t think TFA is written by AI.
But AI written pieces do have a certain feeling. A sort of saccatto in the succession of ideas that does not feel natural. They emphasize certain points, and you as a reader, you just wonder why is that. There is the “This thing, not just that thing”. There are also the three successive propositions (mostly in one sentences) to accentuate an idea and “Negation. Strong positive idea in the same direction”.
In general try reading one (vocally) to yourself and it will feel really weird.
Just like em-dashes, some people have always done these though. Why are they penalized with immediate AI slop witch hunts? The LLMs didn't come up with these tics out of thin air.
You have to start with the reason there’s a witch hunt. I love reading. I read books (novels) almost every day and I’m almost always perusing a textbook pr an article for my jobs and my hobbies. The signal/noise for entertainment or information was fairly high, then come LLM tools.
You start to be interested by the title of an article or a book cover, and then you start reading it and it’s just vapor. Nothing tangible to be gained. It’s like buying something expensive and finding out a cheap trinket under the wrapping.
After a couple of times, you will develop a certain kind of heuristics for this kind of texts. It will not be perfect and will have some false positives, but that’s the only way to keep your sanity.
>The LLMs didn't come up with these tics out of thin air.
LLMs were trained with a lot of synthetic data to transform them from a complete this text into a chatbot, I suspect that this tons of synthetic data that forces the LLM to answer questions into a specific ways also forced them to have this "synthetic/robotic" language. Claude users would have noticed the "belter and suspenders" phrase just started popping out after an update and I am sure is nto because lots of developers used it in their blogs and Anthropic scrapped those blogs in that update.
I’m not sure if it is written by an LLM, but anything being called “load-bearing” (formatted that way and all) sets off my alarm bells
Highly doubtful
Grammarly and GPTZero say 0% AI.