Repeat after me: most software spends the majority of its lifetime in the maintenance phase.
Repeat after me: it follows that most of the money the software makes occurs during the maintenance phase.
Repeat after me: our industry still does not understand this after almost 100 years of being in existence.
Alan Kay was 100% right when he said that the computer revolution hasn't occurred yet. For all of our current advancements all tools are more or less in the Stone Age.
My great hope is that AI will actually accelerate us to a point where the existing paradigm fully breaks beyond healing and we can finally do something new, different, and better.
So for now - squeee! - put a jetpack on your SDLC with AI and go to town!!! Move fast and break things (like, for real).
People in the future are going to wonder what the hell we were thinking, when 30 years down the line everything is a hot mess of billions of lines of code generated by LLMs that no human has read almost any of it and is no longer possible for anyone to maintain neither with nor without LLMs. And the LLM generated garbage will have drowned out all of the good quality code that ever existed and no one will be able to find even human generated code anymore on the internet.
Makes me want to just give up programming forever and never use a computer again.
30 years down the line a human will wake up in his climate controlled bed in an idyllic large scale people-zoo, think about what information he wants, and immediately his 900TB ferroelectric compute-in-memory exobrain will read his thoughts via his brain-computer-interface, and render a custom 3d visualization of that information floating in front of him. There will be no separate code stage, just neural rendering of data to pixels.
I think it’s a mistake to think that we will be blindly going in this direction for many years and then suddenly collectively wake up and realize what have we done. It’s a great filter and a great opportunity.
If LLMs stop improving at the pace of the last few years (I believe they already are slowing down) then they will still manage to crank out billions lines of code which they themselves won’t be able to grep and reason through, leading to drop in quality and lost revenue for the companies that choose to go all-in with LLMs.
But let’s be realistic - modern LLMs are still a great and useful tool when used properly so they will stay. Our goal will be to keep them on track and reduce the negative impact of hallucinations.
As a result software industry will move away from large complex interconnected systems that have millions of features but only a few of them actively used, to small high quality targeted tools. Because their work will be easier to verify and to control the side effects.
> If LLMs stop improving at the pace of the last few years (I believe they already are slowing down)
Depending on how you measure "improvement" they already have or they never will :-/
Measuring capability of the model as a ratio of context length, you reach the limits at around 300k-400k tokens of context; after that you have diminishing returns. We passed this point.
Measuring capability purely by output, smarter harnesses in the future may unlock even more improvements in outputs; basically a twist on the "Sufficiently Smart Compiler" (https://wiki.c2.com/?SufficientlySmartCompiler=)
That's the two extremes but there's more on the spectrum in between.
300k-400k isn’t the current limit if you create modules and/or organize the code reasonably.. for the same reason we do this for humans: it allows us to interact with a component without loading the internals into out context.
you can also execute larger tasks than this using subagents to divide the work so each segment doesn’t exceed the usable context window. i regular execute tasks that require hundreds of subagents, for example.
in practice the context window is effectively unlimited or at least exceptionally high — 100m+ tokens. it just requires you to structure the work so it can be done effectively — not so dissimilar to what you would do for a person
I keep getting surprised that people who are all-in on this (" i regular execute tasks that require hundreds of subagents ") don't have any idea of what is happening even a single layer below their interface to the LLM ("in practice the context window is effectively unlimited or at least exceptionally high — 100m+ tokens.")
I looked at that response by GP (rgbrenner) and refrained from replying because if someone is both running hundreds of agents at a time AND oblivious to what "context window" means, there is no possible sane discourse that would result from any engagement.
Maybe I am unlucky but I had worked with too many developers who couldn't make a good decision if their life depended on it. LLMs at least know how to convince you of their decisions with strong arguments.
The only people who are going to put in the time, are people who care enough to. The problem is you have people who didn’t care before who were equipped with a garden hose. Now that they have a fully pressurized fire hose they can make more of a mess faster.
As an author of fine literature, these million monkeys on typewriters simply upset my sense of dignity. And to imagine the impoverished prose so many readers shalt forthwith be perusing!
Maybe. But it depends on the metric. It seems like orgs are focused on PR count and token usage. Issues caused by poor code are often lagging indicators so it’s asymmetrical in that aspect.
Write lots of code now and statistically look great, while the impact won’t be felt for a much larger range of time.
With the job search and whatnot then yeah, caring becomes a lot more important. That’s true.
Hard disagree. LLMs are fantastic for fixing bad architecture that's been around for a decade because nobody was willing to touch it. I can have it write tons and tons of sanity checks and then have it rewrite functionality piece by piece with far more verification than what I'd get from most engineers.
It's not immediate, it still takes weeks if you want to actually do QA and roll out to prod, but it's definitely better than the pre-LLM alternatives.
Like with a lot of things in this space, it depends where you invest your effort. If you care about quality design and good code, you can definitely get there - but that doesn't happen by default.
With the right investment, we could certainly have tooling that creates and maintains very good designs out of the box. My bet is that we'll continue chasing quick and hacky code, mostly because that's the majority of the code that it was trained on, and because the majority of people seem to be interested in a quick result vs a long-term maintainable one.
But we aren't cooking with gas. We are cooking with a more controlled burner than ever that can download a clean code claude skill and be committing better code than you or I could write.
What would normally be considered overengineered gold plating is "free" now.
I can't get used to vibe-coded projects on Github. One that I was using for a little while is about a year old, with 40,000 commits and 15,000 PRs. And it has "lite" in its name; it's supposed to be the simple alternative. There were so many bugs. I fixed one, submitted a PR, but it was off the first page in hours. It will never be merged. I moved to a different project with a bit less... velocity, and it has been way smoother.
By then, the fix will be easy. Fire up the latest LLM, point it at your codebase and tell it "rewrite this from scratch. do it well. fix the architecture mistakes"
There is definitely going to be some Wirth's law-like [0] effect about the asymmetry of software complexity outpacing LLMs' abilities to untangle said software. Claude 9.2 Optimus Prime might be able to wrangle 1M LoC, but somehow YC 2035 will have some Series A startup with 1B+ LoC in prod — we'll always have software companies teetering on the very edge of unmaintainability.
It won't be an LLM that does it, the entire feature of an LLM is it produces generalizable reasonably "correct" text in response to a context.
The system that makes it have an opinion about good vs bad architecture or engineering sensibilities will be something on top of the transformer and probably something more deterministic than a prompt.
We can do this today too (but definitely hopefully future LLMs make better architectural decisions). With Claude, I've been working on an application for the last 2 months. I didn't have a great vision of what I wanted when I started but I didn't want that to slow me down. The architecture is terrible - Claude separated some functionality into different classes but did a bad job at it and created a big ball of mud. Now that I finally have my vision locked down and implemented (albeit poorly), it'd be a great time to throw it away and start over. It'd be interesting to see the result and see how long it takes.
Just have claude (or gpt maybe) do an architecture review and request a multi-phase refactoring plan. This is probably better to do incrementally as you notice the balls of mud forming but it might not be too late. Either way, if it does something you don't like, `git checkout` and start over
People, as a rule, don't really "go backwards." We didn't really walk back on the industrial revolution, and we're probably not going to walk back from this day-and-age's activities. It's only unsettling until the changes are accepted. The old timers can vie for a time before "all this" when they were children and all their needs were given to them by their now-deceased parents, and the cycle can continue on, yet again.
If you like sci-fi takes on software systems, check out Vernor Vinge "A Fire upon the deep" and sequels. I recall ship systems software is something like all the code humanity has ever written, plus centuries of LLM churn. One of the protagonists is a space faring software developer particularly good with legacy code.
We are used to thinking about software like in the article, a program that runs deterministically in an OS. Where we are headed might be more like where the LLM or AI system is the OS, and accomplishes things we want through a combination of pre-written legacy software, and perhaps able to accomplish new things on the fly.
Interesting, I kinda do this. Sometimes when an LLM solves a problem for me, I have it write code so that I can reuse that exact same approach deterministically(and I line by line check it). Now I have about a dozen CLI commands that the LLM can use and I'm reasonably (although not 100%) sure I'll get an expected outcome. Really helpful with debugging via steam pipe and connecting to read replicas.
Code will never go away. Code was there before computer hardware and it will always be there. Code is (almost?) all of computation theory so unless we throw computers away, we shall always use code.
They're not suggesting that code will go away, but rather that it will be abstracted beneath an LLM interface, so that writing code in the future will be like writing assembly today: some people do it for fun or niche reasons, but otherwise it's not necessary, and most developers can't do it.
Whether that happens or not is a different question, but I believe that's what they're suggesting.
Code is formal and there are basic axioms that grounds its semantic. You can build great constructs on top of those semantics, but you can’t strip away their formality without the whole thing being meaningless. And if you can formalize a statement well enough to remove all ambiguity, then it will turn into code.
Programming is taking ambiguous specs and turning them into formal programs. It’s clerical work, taking each terms of the specs and each statements, ensuring that they have a single definition and then write that definition with a programming language. The hard work here is finding that definition and ensuring that it’s singular across the specs.
Software Engineering is ensuring that programming is sustainable. Specs rarely stay static and are often full of unknowns. So you research those unknowns and try to keep the cost of changing the code (to match the new version of the specs) low. The former is where I spend the majority of my time. The latter is why I write code that not necessary right now or in a way that doesn’t matter to the computer so that I can be flexible in the future.
While both activities are closely related, they’re not the same. Using LLM to formalize statements is gambling. And if your statement is already formal, what you want is a DSL or a library. Using LLM for research can help, but mostly as a stepping stone for the real research (to eliminate hallucinations).
Why are we pretending everyone's code is an etalon of quality? Most software out there is probably hot mess already. No think behind it, let alone ultrathink.
Exactly, before the rise of LLMs it was not at all uncommon to hear people claiming that their job is to just Google API calls or copy and paste code from Stackoverflow. The context back then was that companies are being picky by hiring people who can demonstrate some modicum of understanding of data structures and algorithms because all any developer does is tweak some CSS or make some calls to a database to glue together a CRUD app... why should anyone be expected to know how to reverse a linked list, or how a basic sorting algorithm works... just download an npm package to do that stuff and glue it all together with a series of nested for loops.
With the rise of LLMs that do all of that... those people shutup and shutup real fast.
Hello from assembly programmers to present day javascript folks. Joke aside, I sometimes think how VS Code is written in such layers and layers of code - ~200mb of minified code - Java based IDEs were worser with almost 1GB of code (libs/dependencies). And VS Code did beat native editors (Sublime) of its time to dominate now - may be because of the business model (open & free vs freemium). But it does the job quite well IMO. And it enabled swarms of startups to go to market including billion $ wrappers - including Cursor, Antigravity and almost all UI coding agents. I remember backend developers (Java/C++ type) looking down upon Javascript developers as if we are from an inferior planet or something.
How many of us remember that VSCode is actually a browser wrapped inside a native frame?
VS Code has two things that worked well for it. Web Tech and Money. Web tech makes it easy to write plugins (you already know the stack vs learning python for sublime). And I wonder how much traction it would get if not Microsoft paying devs to wrangle Electron in a usable shape.
Have you ever encountered the very common real life situation where there's some software that works, and you have a binary for it but you either don't have the source code or it doesn't compile for whatever reason? This is the pre-LLM world. Now, do you think LLMs make this situation better or worse? You may not know what's wrong with your software or how to fix it, but unlike in the past you can throw compute at trying to figure it out, or replicating a subset of it, or even replicating all of it depending on what it is. I think LLMs are making this situation better not worse.
I think the problem with that sort of thought is that the burgeoning sizes of output for even trivial software makes it almost a certainty that:
a) The stuff output by the existing LLMs is too unwieldy even for them to handle , even if the product itself is a glorified chatbot.
b) If all software is throwaway, then the value of all software drops to, effectively, the price of an AI subscription. We'll all be drowning in a market of lemons (https://en.wikipedia.org/wiki/The_Market_for_Lemons), whilst also being producers in said market.
another aspect is amount of code LLMs can handle went from few lines to small codebase in few years, so future is just possible for a lot bigger codebases?
There is nothing in the post to support the statement. An interesting personal confession, but it does not establish that vibe coding and agentic engineering are converging as a general phenomenon.
As a piece of meat, I look forward to charge rates of $10,000 an hour, to fix code out the vibe code generation.
I don't think that's how money works. Enough people have poured enough money into this thing that the actual, measurable results/efficacy/ROI are of secondary importance (to put it mildly). At this point AI adoption is (at least sold as) a fait accompli.
This is wishful thinking. The force of the market is "number go up". Quality increasingly has less and less of a role in the equation. You will eat your slop, and you will like it. It will be the only choice you have.
But the quality of code was already very bad due to market forces. Most code at large companies is notoriously poor despite the talent density, because the incentives are not there to tackle tech debt or improve code quality.
With such a low baseline, there is an optimistic perspective that LLMs could improve the situation. LLMs can produce excellent code when prompted or reviewed well. Unlike human employees, the model does not worry about getting a 'partially meets expectations' rating or avoid the drudgery of cleaning up other people's code.
The model is optimized in a different way to "partially meet expectations". Sycophancy coupled with only really "knowing" what it has been trained on assure a different kind of mediocrity.
The same incentives that discourage good code in pre-AI times are still dominating now. You will be pushed to ship sub-par products in the future, just like you were in the past.
AI certainly has the potential to make the underlying code/design a lot cleaner. We will also be working with dramatically more code, at a much higher rate of change. That alone will be a big challenge to keep sustainable.
The ones making the decision to under-invest on design are either are unaware of the real costs, or are aware and are deliberately choosing that path - that's not new, and I don't expect it to change.
I agree generally but there are periods where creative people show up and a whole slew of existing firms go bust/shrink due to one’s ability to envision a path toward creative destruction.
> People in the future are going to wonder what the hell we were thinking, when 30 years down the line everything is a hot mess of billions of lines of code generated by LLMs that no human has read
--
It's just as likely that people will be surprised that we used to have billions of lines of human generated code, that no LLM ever approved.
By then AI would be good enough to clean them all up....like I dont get these dooming scenarios they always assume that we are going to be stuck with LLMs and there wont be anything new coming.
The difference between writing assembly code and Ruby code is much smaller than the difference between programming and vibe coding.
Also, companies are pressuring employees towards adoption in novel ways. There was no such industry-wide pressure by employers in the 90s, 2000s or 2010s for engineers to use a specific tech.
> Also, companies are pressuring employees towards adoption in novel ways. There was no such industry-wide pressure by employers in the 90s, 2000s or 2010s for engineers to use a specific tech.
Companies have been enforcing technology mandates since time immemorial. In the early 2000s there were definitely a lot of mandates to move away from commercial UNIX to Linux. Lots of companies began enforcing the switch to PHP, Ruby and Python for new projects.
Or, it could be like asbestos and the immediate benefits are just too appealing to listen to arguments of skeptical naysayers about some vaguely defined problems that are decades away, if they even happen.
I use AI tools daily (because they feel like they're helping me)
but it's not exactly hard to imagine scenarios where an explosion of slop piling up plus harm to learning by outsourcing all thinking results in systemic damage that actually slows the pace of technological progress given enough time.
History of new technologies tend to average into a positive trend over a long enough time scale but that doesn't mean there aren't individual ups and downs. Including WTF moments looking back at what now seems like baffling decision-making with benefit of hindsight.
Some of us are already experiencing that. For example I handed off an initial version of something some months ago, and the AI-generated stuff they came up with was a huge buggy mess of spaghetti code neither of us understood. Months later we've detangled it, cutting it down to a third the size, making it far simpler to understand, and fixing several bugs in the process (one was even by accident, we'd made note of it, then later when we went to fix it, it was already fixed).
If it is, the fall out will be way worse than if AI ends up living up to (reasonable) expectations.
If it doesn’t, we are going to see over a trillion dollars of capital leave the tech sector, which I think will have worse impacts on the livelihood of tech workers than if AI ends up panning out.
This is something the naysayers need to grapple with. We’ve crossed a line where this tech needs to work simply because of the amount of money depending on that fact.
The asbestos hypothetical is a bit different than the "bubble popping" economic crisis scenario though. In this world, AI would just continue being adopted and shoved into every nook and cranny into which it can be made to fit, with valuations only getting bigger and bigger.
The damage would come much later, well beyond the point where it could be simply pulled out and replaced without spending massive amounts of money and would also basically necessitate training an entire new generation of engineers.
Then the AI giants would start appearing vulnerable like cigarette companies in the 90s while an AI Superfund and interstate class action are being planned but Sam Altman would already be a centitrillionaire at that point so it would be someone else's problem.
Have you ever worked on a legacy codebase with actual good code? I struggle to see the difference between your predicted future and today's reality when it comes to working with legacy disasters.
Well, on legacy code base, you still needed humans to write those lines of code. There's a maximal amount of lines a human can write in a year.
Now with LLM we are talking of millions and millions of line of code that could be generated in a single day. The scale of the problem might not be the same at all.
Vibe Coding (and LLMs) did not create undisciplined engineering organizations or engineers. They exposed and accelerated them.
Plenty of engineers have loose (or no!) standards and practices over how they write coee. Similarly, plenty of engineering teams have weak and loose standards over how code gets pushed to production. This concept isn't new, it's just a lot easier for individuals and teams who have never really adhered to any sort of standards in their SDLC to produce a lot more code and flesh out ideas.
Bad engineers continue being bad, good engineers continue being good.
I personally don’t know any colleagues who were good engineers just because they wrote code faster. The best engineers I know were ones who drew on experience and careful consideration and shared critical insights with their team that steered the direction of the system positively.
> Claude, engineer a system for me, but do it good. Thanks!
> I personally don’t know any colleagues who were good engineers just because they wrote code faster
Same, if anything, the opposite seems to be true, the ones that I'd call "good engineers" were slower, less panicked when production was down and could reason their way (slowly) through pretty much anything thrown at them.
Opposite experience, I've sit next to developers who are trying their fastest to restore production and then making more mistakes to make it even worse, or developers who rush through the first implementation idea they had for a feature, missing to consider so many things and so on.
> Same, if anything, the opposite seems to be true, the ones that I'd call "good engineers" were slower
Unfortunately, a lot of workplaces are ignoring this, believing their engineers are assembly line workers, and the ones who complete 10 widgets per minute are simply better than the ones who complete 5 widgets per minute.
It isn't just that they believe this - they want a business model where this is how it works. For a big company a star coder is a liability - they have strong labor power, they can leave and they are hard to replace, etc.
Companies want workflows that work with mediocre programmers because they are more like interchangeable parts. This is the real secret to why AI programming will work in a lot of places. If you look at the externalities of employing talented people, shitty code actually looks better than great code.
To these kinds of companies, what's even better than a rack of mediocre programmers? AI agents that you can just conjure up and prompt. They take up no facility space, don't require lunch breaks or vacations, obey all commands and direction, and produce a predictable and consistent amount of output per dollar.
This is the earworm the leaders of these companies have allowed into their minds. Like Agent Mulder, they Want To Believe in this so badly...
Glad I find myself employed under a division called Research and Development. Poaching and retaining highly compensated individuals is the entire purpose.
> I personally don’t know any colleagues who were good engineers just because they wrote code faster.
However, the best engineers I know are usually among the quickest to open an editor or debugger and use it fluently to try something out. It's precisely that speed that enables a process like "let's try X, hmm, how about Y, no... ok, Z is nice; ok team, here are the tradeoffs...". Then they remember their experience with X, Y, and Z, and use it to shape their thinking going forward.
Meanwhile, other engineers have gotten X to finally mostly work and are invested in shipping it because they just want to be done. In my experience, this is how a lot of coding agents seem to act.
It's not obvious to me how to apply the expert loop to agentic coding. Of course you can ask your agent to try several different things and pick the best, or ask it to recommend architectural improvements that would make a given change easier...
Or: depth-first search of the solution space vs breadth-first (or balanced) search of the solution space.
> Of course you can ask your agent to try several different things and pick the best, or ask it to recommend architectural improvements that would make a given change easier
The ideal solution increasingly seems to be encoding everything that differentiates a good engineer from a bad engineer into your prompt.
But at that point the LLM isn’t really the model as much as the medium. And I have some doubts that LLMs are the ideal medium for encoding expertise.
> However, the best engineers I know are usually among the quickest to open an editor or debugger and use it fluently to try something out
The Pragmatic Programmer book has whole chapters about this. Ultimately, you either solve the problem analogously (whiteboard, deep thinking on a sofa). Or you got fast as trying out stuff AND keeping the good bits.
Yeah, a lot of people came of age with a "we'll fix it when it's a problem" mindset. Previously their codebases would start to resist feature development, you'd fix the immediate bottlenecks, and then you could kick the can down the road a bit until you hit the next point of resistance. You kinda refactor as you do features. The frontier models have pushed the "it's a problem" moment further back. They can kinda work with whatever pile of code you give them... to a point. So it manifests as the LLM introducing extra regressions, or dropping more requirements than it used to, but it's not really manifesting as the job being harder for you. It's just not as smooth as it was from an empty repository. Then you hit the point where it just breaks too much and you need to fix it. And the whole codebase is just fractal layers of decisions that you didn't make. That's hard to untangle. And you're not editing the code yourself, so you don't have that visceral "adding this specific thing in this specific way has a lot of tension" reaction that allows you to have those refactoring breakthroughs.
Vibe coded apps with barely no tests, invariants, etc. No wonder it turns into spaghetti. You can always refactor code, force agents to write small modular pieces and files. Good engineering is good engineering whether an agent or human wrote the code. Take time to force agents to refactor, explore choices. Humans must at least understand and drive architecture at this point still. Agents can help and do recon amazingly and provide suggestions.
For me the distinction is the quality and rigor of your pipeline.
Vibe coding: one shot or few shot, smoke test the output, use it until it breaks (or doesn't). Ideal for lightweight PoC and low stakes individual, family or small team apps.
Agentic engineering:
- You care about a larger subset of concerns such as functional correctness, performance, infrastructure, resilience/availability, scalability and maintainability.
- You have a multi-step pipeline for managing the flow of work
- Stages might be project intake, project selection, project specification, epic decomposition, d=story decomposition, coding, documentation and deployment.
- Each stage will have some combination of deterministic quality gates (tests must pass, performance must hit a benchmark) and adversarial reviews (business value of proposed project, comprehensiveness of spec, elegance of code, rigor and simplicity of ubiquitous language, etc)
And it's a slider. Sometimes I throw a ticket into my system because I don't want to have to do an interview and burn tokens on three rounds of adversarial reviews, estimating potential value and then detailed specification and adversarial reviews just to ship a feature.
If your slider only goes between vibe coding or agentic engineering you're missing an entire range of engineering where the human is more involved.
I've been using Opus, GPT-5.5, and some lesser models on a daily basis, but not having them handle entire tasks for me. Even when I go to significant effort to define and refine specs, they still do a lot of dumb things that I wouldn't allow through human PR review.
It would be really easy to just let it all slide into the codebase if I trusted their output or had built some big agentic pipeline that gave me a false sense of security.
Maybe 10 years from now the situation will be improved, but at the current point in time I think vibe coding and these agentic engineering pipelines are just variations of a same theme of abdicating entirely to the LLM.
This morning I was working on a single file where I thought I could have Opus on Max handle some changes. It was making mistakes or missing things on almost every turn that I had to correct. The code it was proposing would have mostly worked, but was too complicated and regressed some obvious simplifications that I had already coded by hand. Multiply this across thousands of agentic commits and codebases get really bad.
I agree, vibe coding does not have quality gate checks at each stage, while agentic engineering does. Dev teams get into trouble when they try build to build without a proper process of design, tests, and reviews. This was true before agentic coding, but it's especially true now. The teams that understand how to leverage agents in this process are the ones that will be most successful.
Perhaps I've missed a few weeks worth of progress, but I don't think that AIs have become more trustworthy, the errors are just more subtle.
If the code doesn't compile, that's easy to spot. If the code compiles but doesn't work, that's still somewhat easy to spot.
If the code compiles and works, but it does the wrong thing in some edge case, or has a security vulnerability, or introduces tech debt or dubious architectural decisions, that's harder to spot but doesn't reduce the review burden whatsoever.
If anything, "truthy" code is more mentally taxing to review than just obviously bad code.
I know there are good uses of LLMs out there. I do. But.
The current fever pitch mandates from above seem to want it applied liberally, and pushing back against that is so discouraging and often career-limiting as to wear the fabric of one's psyche threadbare. With all the obvious problems being pointed out to people, there are just as many workarounds; and these workarounds, as is often revealed shortly thereafter, have their own problems, which beget new solutions, ad infinitum.
At some point it genuinely seems like all this work is for the sake of the machine itself. I suppose that is true: The real goal has become obscured at so many firms today, that all that remains is the LLM. Are the people betting the farm and helping implement the visions of those who have done so guaranteed a soft exit to cushion them from the consequences, or is rationality really being discarded altogether?
Sure, sound engineering principles can help work around these problems, but what efficiency is truly gained, in terms of cognitive load, developer time, money, or finite resources? Or were those ever an earnest concern?
It’s an absolute game changer, and it can now multiply your productivity fivefold if it’s a solo greenfield project.
Maybe half a year ago it was as you said. You had to wait for the agent to finish, you had to review carefully, and often the result was not that great. You did not save a lot of time.
Now I can spin up 3+ parallel conversations in Codex, each in a git worktree. My work is mainly QA testing the features, refining the behavior, and sometimes making architectural decisions.
The results are now undeniable. In the past I could not have developed a product of that scope in my free time.
That is what is possible today. I suspect many engineers have not yet tried things that became feasible over the last months. Like parallel agents, resolving merge conflicts, separating out functionality from a large branch into proper PRs.
The degenerate side is clueless upper management and fad-driven engineering. We have talked extensively about this.
There is a more rational side to it that I've seen in my org: some engineers absolutely refuse to use AI and as a consequence they are now, clearly and objectively, much less productive than other engineers. The thing is, you still need to learn how to use the tool, so a nontrivial percentage of obstinate engineers need to be driven to use this in the same way that some developers have refused to use Docker or k8s or whatever.
Ah yes, we must force these obstinate engineers to the right path! Only after getting everyone to see the light will they understand and thank us for boundless productivity!! /s
Perhaps these “obstinate” engineers have good reason in their decision. And it should be their decision!
To be so confident in what is “the right way (TM)” and try to force it onto others is... revealing.
This has generally been the case, though. As mentioned in the post, "You want solutions that are proven to work before you take a risk on them" remains true and will be place where the edges are found.
If I get pwned because my AI agent wrote code that had a security vulnerability, none of my users are going to accept the excuse that I used AI and it's a brave new world. I will get the blame, not Anthropic or OpenAI or Google but me.
The same goes for if my AI generated code leads to data loss, or downtime, or if uses too many resources, or it doesn't scale, or it gives out error messages like candy.
The buck stops with me and therefore I have to read the code, line-by-line, carefully.
It's not even a formality. I constantly find issues with AI generated code. These things are lazy and often just stub out code instead of making a sober determination of whether the functionality can be stubbed out or not.
You could say "just AI harder and get the AI to do the review", and I do this a lot, but reviewing is not a neutral activity. A review itself can be harmful if it flags spurious issues where the fix creates new problems. So I still have to go through the AI generated review issue-by-issue and weed out any harmful criticism.
> It’s not just the downstream stuff, it’s the upstream stuff as well. I saw a great talk by Jenny Wen, who’s the design leader at Anthropic, where she said we have all of these design processes that are based around the idea that you need to get the design right—because if you hand it off to the engineers and they spend three months building the wrong thing, that’s catastrophic.
This is spot on. I think the tooling is evolving so much particularly on the design side that its not worth the "translation cost" to stay (or even be) on the Figma side anymore.
What an excellent article by a smart, humble, still-learning person!
Favorite quote:" There are a whole bunch of reasons I’m not scared that my career as a software engineer is over now that computers can write their own code, partly because these things are amplifiers of existing experience. If you know what you’re doing, you can run so much faster with them. [...]
I’m constantly reminded as I work with these tools how hard the thing that we do is. Producing software is a ferociously difficult thing to do. And you could give me all of the AI tools in the world and what we’re trying to achieve here is still really difficult. [...]"
> If you can go from producing 200 lines of code a day to 2,000 lines of code a day, what else breaks? The entire software development lifecycle was, it turns out, designed around the idea that it takes a day to produce a few hundred lines of code. And now it doesn’t.
It is so embarrassing that LOC is being used as a metric for engineering output.
LOC is useful here not because it's a metric for output but because it's a metric for _understandability_. Reviewing 200 lines is a very different workload than reviewing 2000.
That's assuming the 200 lines are logical and consistent. Many of my most frustrating LLM bugs are caused by things that look right and are even supported by lengthy comments explaining their (incorrect) reasoning.
The point is that LOC is never a good metric for any aspect of determining the quality of code or the coder because it ignores the nuance of reality. It's impossible to generalize because the code can be either deceptively dense or unnecessarily bloated. The only thing that actually matters is whether the business objective is achieved without any unintended side effects.
> The only thing that actually matters is whether the business objective is achieved without any unintended side effects.
Objectives change; timeliness matters. The speed at which you deliver value is incredibly important, which is why it matters to measure your process. Deceptively dense is what I’d call software engineers who can’t accept that the process is actually generalizable to a degree and that lines of code are one of the few tangible things that can be used as a metric. Can you deliver value without lines of code?
> Objectives change; timeliness matters. The speed at which you deliver value is incredibly important, which is why it matters to measure your process.
This assumes that shorter code is faster to write. To quote Blaise Pascal, "I would have written a shorter letter, but I did not have the time."
> Can you deliver value without lines of code?
No, but you can also depreciate value when you stuff a codebase full of bloated, bug-ridden code that no man or machine can hope to understand.
You seem determined to misinterpret. I’m not talking about LOC as a measure of productivity. The ratio of LOC needing review to the capacity of reviewers (using how many LOC can be read/reviewed over the sampling period) is what’s being discussed. Agentic AI/vibe coding has caused that ratio to increase and shows a bottleneck in the SDLC. It’s a proxy metric, get over yourself.
“All models are wrong, some are useful”. What’s not useful is constantly bitching about how there’s no way to measure your work outside of the binary “is it done” every time process efficiency is brought up.
Yes, reading this back, I definitely veered off-topic. I apologize. I still don't think that you can say how much time it will take to review code based on how many lines of code are involved, but my argument was not well crafted. I just hope that others can learn something from our discussion. Thank you for being patient with me, and I hope you have a good day! :)
I have worked with code where 1000s of lines are very straightforward and linear.
I’ve worked on code where 100 lines is crucial and very domain specific. It can be exceptionally clean and well-commented and it still takes days to unpack.
The skills and effort required to review and understand those situations are quite different.
One is like distance driving a boring highway in the Midwest: don’t get drowsy, avoid veering into the indistinguishable corn fields, and you’ll get there. The other is like navigating a narrow mountain road in a thunderstorm: you’re 100% engaged and you might still tumble or get hit by lightning.
The number of bugs tends to be linear to lines of code written meaning fewer lines of code for the same functionality will have fewer bugs.
So I’m pretty skeptical that reviewing 2000 lines of code won’t take any more time than reviewing 200 lines of code.
Furthermore how do you know the AI generated lines are the open highway lines of code and not the mountain road ones? There might be hallucinations that pattern match as perfectly reasonable with a hard to spot flaw.
I experimented with vibe coding (not looking at the code myself) and it produced around 10k LOC even after refactors etc.
I rewrote the same program using my own brain and just using ChatGPT as google and autocomplete (my normal workflow), I produced the same thing in 1500 LOC.
The effort difference was not that significant either tbh although my hand coded approach probably benefited from designing the vibe coded one so I had already though of what I wanted to build.
Sounds like a great oppurtunity to understand your own development process, and codify it in such detail that the agent can replicate how you work and end up with less code but doing the same.
My experience was the same as you when I started using agents for development about a year ago. Every time I noticed it did something less-than-optimal or just "not up to my standards", I'd hash out exactly what those things meant for me, added it to my reusable AGENTS.md and the code the agent outputs today is fairly close to what I "naturally" write.
LoC is perfectly fine as a metric for engineering output. It is terrible as a standalone measure of engineering productivity, and the problems occur when one tries to use it as such.
It's still useful, however, because that is the only metric that is instantly intuitively understandable and comparable across a wide variety of contexts, i.e. across companies and teams and languages and applications.
As we know, within the same team working on the same product, a 1000 LoC diff could take less time than a 1 line bug fix that took days to debug. Hence we really cannot compare PRs or product features or story points across contexts. If the industry could come up with a standard measure of developer productivity, you'd bet everyone would use it, but it's unfeasible basically for this very reason.
So, when such comparisons are made (and in this case it was clearly a colloquial usage), it helps to assume the context remains the same. Like, a team A working on product P at company C using tech stack T with specific software quality processes Q produced N1 lines of code yesterday, but today with AI they're producing N2 lines of code. Over time the delta between N1 and N2 approximates the actual impact.
(As an aside, this is also what most of the rigorous studies in AI-assisted developer productivity have done: measure PRs across the same cohorts over time with and without AI, like an A/B test.)
Is it? The whole point of the article is that the rate of output for writing code has surpassed the rate at which it can be reviewed by humans. LOC as an input for software review makes a lot of sense, since you literally need to read each line.
Because not everyone is just out after earning the most money, some people also want to enjoy the workplace where they work. Personally, what the quality of the codebase and infrastructure is in matters a lot for how much you enjoy working in it, and I'd much rather work in a codebase I enjoy and earn half, than a codebase made by just jerking out as many LOC as possible and earn double.
Although this requires you to take pride in your profession and what you do.
All of human agency must prop up the vanity of you. Of all people.
Got it.
...ok fine; lack of political action to put us all on the hook for your healthcare is your choice to take a gamble on a paycheck. It's a choice to say your own existence is not owed the assurance of healthcare.
So I will honor your choice and not care you exist.
Do you reject all stats that treat the number of people involved (eg. 2 million pepole protested X) as "embarrassing" ... because they lump incredibly varied people together and pretend they're equal?
I read somewhere that measuring software engineering output by LoC is like measuring aerospace engineering by pounds added to the plane and I thought that was an apt comparison.
Totally. I thought Simon was wiser than this; even he couldn't resist getting swept up by breathless hype. The moment you start typing "LOC as a metric", alarm bells should go off in your head.
This was a podcast, not a pre-scripted talk. I suggest listening to the audio version - it makes it more clear that this was thinking out loud, not carefully considering every word.
I see, fair point. Sorry for taking a dig at you. Please know that I do appreciate a lot of work that you do. I was just worried for a moment when just reading that bit.
Have you noticed that the coding agents get really close to the solution on the first one shot and then require tons of work to get that last 10% or 5%?
If we shift the paradigm of how we approach a coding problem, the coding agents can close that gap. Ten years ago every 10 or 15 minutes I would stop coding and start refactoring, testing, and analyzing making sure everything is perfect before proceeding because a bug will corrupt any downstream code. The coding agents don't and can't do this. They keep that bug or malformed architecture as they continue.
The instinct is to get the coding agents to stop at these points. However, that is impossible for several reasons. Instead, because it is very cheap, we should find the first place the agent made a mistake and update the prompt. Instead of fixing it, delete all the code (because it is very cheap), and run from the top. Continue this iteration process until the prompt yields the perfect code.
Ah, but you say, that is a lot of work done by a human! That is the whole point. The humans are still needed. The process using the tool like this yields 10x speed at writing code.
This was often true when writing code manually to be fair.
You could get to "something that works" rather fast but it took a long time to 1) evaluate other options (maybe before, maybe after), 2) refine it, 3) test it and build confidence around it.
I think your point stands but no one really knows where. The next year or so is going to be everyone trying to figure that out (this is also why we hear a lot of "we need to reinvent github")
When I hire fresh out of college… I can see them coming in and not having the slightest comprehension of the difference of the things that they did in school to get a grade and never touch it again versus a product that is supposed to exist and work for 10+ years.
The problem of life in general is the last 5-10% is always the hardest. And it makes no economic sense in many cases to invest in trying to make that last part mechanised.
I believe the llm providers went with the wrong approach from the off - the focus should’ve been on complementing labour not displacement. And I believe they have learned an expensive lesson along the way.
I tend to get something working and refactor my way out, which does work and you can use a coding agent to do it, but it takes time. Maybe starting over would have been better, but I didn’t know what I wanted the architecture to look like at the beginning.
That will not work as cleanly as you described once a lot of code has been committed to the code base. You cannot just blow away an entire working code base and start over just because an LLM is struggling to make a feature work with existing architecture.
Strong agree. Most orgs will stay tangled in the mess they hand-coded over the years, a few greenfield teams will pull ahead, but until some LLM-fuelled startup displaces a strong incumbent I'm skeptical that we're on the cusp of anything other than a K-shaped transition. I see already low quality software and orgs getting flushed to make room for some new ideas now that the barrier to entry is slightly lower (but far from free). I just wish the transition was done with more humanity.
I think all coding will become vibe coding, but it will be no less an engineering discipline.
Note: I still review pretty much every line of code that I own, regardless of who generates it, and I see the problems with agents very clearly... but I can also see the trends.
My take: Instead of crafting code, engineering will shift to crafting bespoke, comprehensive validation mechanisms for the results of the agents' work such that it is technically (maybe even mathematically) provable as far as possible, and any non-provable validations can be reviewed quickly by a human. I would also bet the review mechanisms would be primarily visually, because that is the highest bandwidth input available to us.
By comprehensive validations I don't mean just tests, but multiple overlapping, interlocking levels of tests and metrics. Like, I don't just have an E2E test for the UI, I have an overlapping test for expected changes in the backend DB. And in some cases I generate so many test cases that I don't check for individual rows, I look at the distribution of data before and after the test. I have very few unit tests, but I do have performance tests! I color-code some validation results so that if something breaks I instantly know what it may be.
All of this is overkill to do manually but is a breeze with agents, and over time really enables moving fast without breaking things. I also notice I have to add very few new validations for new code changes these days, so once the upfront cost is paid, the dividends roll in for a long time.
Now, I had to think deeply about the most effective set of technical constraints that give me the most confidence while accounting for the foibles of the LLMs. And all of this is specific to my projects, not much can be generalized other than high-level principles like "multiple interlocking tests." Each project will need its own custom validation (note: not just "test") suites which are very specific to its architecture and technical details.
So this is still engineering, but it will be vibe coding in the sense that we almost never look at the code, we just look at the results.
I agree somewhat, but I do still think there is a decently sized separation between true vibe coding (the typical "make me an app...fix this bug") and actual AI assisted development. I personally think that if you are a dev and you simply trust the AI's output, that is still vibe coding.
I am not a developer and have very basic code knowledge. I recently built a small and lightweight Docker container using Codex 5.5/5.4 that ingests logs with rsyslog and has a nice web UI and an organized log storage structure. I did not write any code manually.
Even without writing code, I still had to use common sense in order to get it in a place I was happy with. If i truly knew nothing, the AI would have made some very poor decisions. Examples: it would have kept everything in main.go, it would have hardcoded the timezone, the settings were all hardcoded in the Go code, the crash handling was non existent, and a missing config would have prevented start. And that is on a ~3000 line app. I cannot imagine unleashing an AI on a large, complex. codebase without some decent knowledge and reviewing.
The real paradigm shift is not here yet, but not very far away. I'm talking about the single unified codebase. Agents building a unique codebase for all your software needs.
Because most of the complexity in software comes from interfacing with external components, when you don't need to adapt to this you can write simpler and better code.
Rather than relying on an external library, you just write your own and have full control and can do quality control.
Linux kernel is 30 000 000 LOC. At 100 tokens /s, let's say 1 LOC per second produced for a single 4090 GPU, in one year of continuous running 3600 * 24 * 365 = 31 536 000 everyone can have its own OS.
It's the "Apps" story all over again : there are millions of apps, but the average user only have 100 max and use 10 daily at most.
Standardize data and services and you don't need that much software.
What will most likely happen is one company with a few millions GPUs will rewrite a complete software ecosystem, and people will just use this and stop doing any software because anything can be produced on the fly. Then all compute can be spent on consistent quality.
The scary part is that codebases are getting layers of AI complexity, that it's going to cost $$$ to have the latest model decipher and make changes as no human can understand the code anymore.
Pretty soon there is no code reuse and we're burning money reinventing the wheel over and over.
> The scary part is that codebases are getting layers of AI complexity, that it's going to cost $$$ to have the latest model decipher
Isn't this a bit like old Java or IDE-heavy languages like old Java/C#? If you tried to make Android apps back in the early days, you HAD to use an IDE, writing the ridicolous amount of boilerplate you had to write to display a "Hello Word" alert after clicking a button was soul destroying.
This is a timely observation and feels right to me. I needed to get a relatively simple batch download -> transform -> api endpoint stood up. I wrote a fairly detailed prompt but left a lot of implementation details out, including data sources.
Opus 4.7 built it about 90% the same way I would, but had way more convenience methods and step-validations included.
It's great, and really frees me
up to think about harder problems.
This is my experience too. I'm primarily a python dev, but have been routinely using other backend languages (rust, go, etc) that I'm familiar with but not at the same level.
Just having ~13yrs experience heavily weighted in one language with some formal studying of others makes directing llms a lot simpler.
Learning syntax, primitives, package managers, testing, etc isn't that much of a lift compared to how I used to program.
Was helping a non-dev colleague who's using claude cowork/code to automate reporting the other day. They understand the business intelligence side well, but were struggling with basic diction to vibe code a pyautogui wrapper to pull up RDP and fill out a MS Access abstraction on a vendor DB.
Think we'll be fine for another 5-10 years as a profession
It's already the case that you get much better results out of LLMs by forcing agents using them to go through additional layers of planning, design & review.
The future is going to dynamically budget and route different parts of the SLDC through different models and subagents running on the cloud. Over time, more and more of that process will be owned by robots and a level of economic thinking will be incorporated into what is thought of today as "software engineering." At some point vibe coding _is_ coding and we're maybe closer to that point than popularly believed.
Claude often does things in more detail, and even better, than I would, in the first pass. But I don't understand how anybody stands comments generated by an LLM?
It's seriously the thing that worries (and bothers) me the most. I almost never let unedited LLM comments pass. At a minimum.
Most of the time, I use my own vibe-coded tool to run multiple GitHub-PR-review-style reviews, and send them off to the agent to make the code look and work fine.
It also struggles with doing things the idiomatic way for huge codebases, or sometimes it's just plain wrong about why something works, even if it gets it right.
And I say this despite the fact that I don't really write much code by hand anymore, only the important ones (if even!) or the interesting ones.
Also, don't even get me started on AI-generated READMEs... I use Claude to refine my Markdown or automatically handle dark/light-mode, but I try to write everything myself, because I can't stand what it generates.
I find that the best thing about generating documentation with LLM's is that it gets me angry enough to rewrite it correctly.
"Ugh, no! Why would you say it like that? That's not even how it works! Now, I need to write a full paragraph instead of a short snippet to make sure that no future agents get confused in the same way."
But using an agentic LLM to complete boilerplate is attractive simply because we've created a mountain of accidental and intentional complexity in building software. It's more of a regression to the mean of going back to the cognitive load we had when we simply built desktop applications.
There are techniques for improving our confidence in our software: unit testing, integration testing, fuzz testing, property-based testing, static analysis, model checking, theorem proving, formal methods, etc. The LLM is not only a tool for generating lines of code. It can also generate lines of testing. The goal is that the tests are easier to audit by the humans than the code.
I've found that one of the areas I enjoyed least is now what I spend a lot of time on now: testing!
Property-based testing in particular has uncovered a number of invariants in every code base I've introduced it to.
tbf depending on the agent/model a lot of the tests end up being thrown out so it's possible I _should_ handwrite more tests, but having better prompts and detailed plans seems to mitigate that somewhat
Given rapidly decelerating quality of, at least, claude code output, the agentic coding use may decrease. It is insane how bad the results of background agents are now: constant hallucinations, nonsensical outputs.
The heavy users of Claude at my job disagree (me included), our work gets shipped and the quality has increased by all metrics. Are you talking about enterprise or consumer Claude subscriptions? I think they're serving drastic different quality depending on how much $ you fork up.
When I was in grad school I graded homework for first year math classes, and the thing about math homework is that the perfect homework takes almost no time to grade.
It's the bad, semi-coherent submissions that eat up your time, because you do want to award some points and tell students where they went wrong. It's the Anna Karenina principle applied to math.
Code review is the same thing. If you're sure Claude wrote your endpoint right, why not review it anyway? It's going to take you two minutes, and you're not going to wonder whether this time it missed a nuance.
Typically in engineering you don't know what you're doing. If you're sure of what it should look like going in, you're more of a technician. I think most people coding have no idea what they're doing to a large extent- not many people can do the same rote work for years straight.
> The thing that really helps me is thinking back to when I’ve worked at larger organizations where I’ve been an engineering manager. Other teams are building software that my team depends on.
> If another team hands over something and says, “hey, this is the image resize service, here’s how to use it to resize your images”... I’m not going to go and read every line of code that they wrote.
The distance of accountability of the output from its producer is an important metric. Who will be held accountable for which output: that's important to maintain and not feel the "guilt".
So, organizations would need to focus on better and more granular building incentives and punishment mechanisms for large-scale software projects.
I think I'm just too opinionated to go there. If I see something that works fine, but isn't the way I'd do it, it doesn't matter if a human or an LLM wrote it I'm still in there making it match my vision.
I concur, and I think that is one of the most difficult aspects of reviewing another's code. It's difficult for me to sometimes differentiate between what is acceptable vs. what I would have done. I have to be very conscious to not impose my ideals.
So you are going to waste everyone's time getting another developer to write code the way you want? This resonates with me because at my company I get this all the time. At that point, you might as well close my PR and do it yourself, whatever way you want. I really like the advice from the book 0 2 1, to assign different areas of responsibility to people, so that there is no conflict.
The current state of the technology is that you must read at least some of the code, but everyone keeps shipping tools that are focussed on churning out more and more stuff without giving you any affordances to really understand the output.
Claude Code in particular seems really uninterested in this aspect of the problem and I've stopped using entirely because of this.
the discourse around "code quality" has always attracted the least nuanced minds, ones who see the world and the phenomenon of life as nothing but territory to be divided up by the latest buzzwords. the worst ones insist that we narrow the discussion even further, to focus on the conflicts between these buzzwords. whenever i have to sit through such discussions, i try to meditate on the irony of mother nature weaving the most functionally brutal, ruthlessly redundant poetry that is the genetic code, only for the resulting creatures to deny themselves the power of the principles inherent in their own construction.
I think this is what people mean when they say LLMs are a higher level abstraction. We still need to consider edge cases and have tests. We still to sweat the architecture and understand how the pieces fit together and have a mental map of the codebase. But within each bottom node of that architecture we don't sweat the details. Anything obvious gets caught right away. Most subtle/interaction-based issues occur at the architecture level. Anything that bypasses those filters is a weird bug that is no worse or different from a normal bug fixes - an edge case that was hit in a real world scenario that gets flagged by a user or a logged as an error.
There are certain codebases and pieces of code we definitely want every line to be reasoned and understood. But like his API endpoint example, no reason to fuss with the boilerplate.
This has definitely been my shift over the past few months, and the advantage is I can spend much more time and energy on getting the code architecture just right, which automatically prevents most of the subtle bugs that has people wringing their hands. The new bar is architecting code to be defined as well as an API endpoint->service structure so you can rely on LLMs to paint by numbers for new features/logic.
Vibe coding is just coding now. Writing assembly used to be a thing too until higher and higher languages were created. LLM is like that except it compiles English to code. This scares lot of professionals understandably.
I am experimenting with writing en entire TypeScript compiler[1] with AI assistant. I've spent 4 months on it already. It might not be successful at the end of the day but my thinking is that if LLMs are going to write a lot of the code I better learn how this can and can not work. I've learned a lot from this project already. I think we're still in charge of design and big ideas even if all of the code is written by AI
I'm also experimenting with it more and more. Now I'm trying to create a 2D side-scrolling shooter with it, running in the browser. When it was relatively small, it did a good job. As the codebase and docs/ files that I'm using get larger it starts hallucinating, especially when the context gets at about 50% usage (Codex w/ gpt5.5). As in, it'll literally forget to update parts of the code.
e.g, I change velocity of player to '200' and of bullets to '300', and it only updated the bullet velocity. Then told me the player was already 'at the correct value' even though it was set to 150. Things like that.. :)
For me, unless there is a concrete way of proving work is correct you can't rely on AI coding. tsz has super strict tests around correctness, performance and architectural boundaries
If I understood you correctly, I think I'm less extreme than that. Most code written by humans is also not provably correct. But I'm assuming you mean provably correct like Lean: https://lean-lang.org/, and not just "passes tests".
If you mean 'passes tests', that can be tackled by AI. Although AI writing its own tests and then implementing its own code is definitely not a foolproof strategy.
Multiple computers and each multiple Claude Code or Codex sessions. It had lots of ups and downs. Now I have a good enough test harness that makes it easier to iterate faster
The problem with vibe coding closer is that the agentic makes a very plasticy samey feel unless you work with something that makes it unique or can pass a template through it.
Correct me if I’m wrong Simon, but weren’t you highly optimistic about llm’s and agentic-use of them?
I believe this is a common fault of not being able to zoom out and look at what trade offs are being made. There’s always trade-offs, the question is whether you can define them and then do the analysis to determine whether the result leaves you in a net benefit state.
I think you kind of answered this in the post though. "I want somebody to have used the thing" is dogfooding. and it's probably the only quality signal left that can't be generated in 30 minutes.
> It used to be if you found a GitHub repository with a hundred commits and a good readme and automated tests and stuff, you could be pretty sure that the person writing that had put a lot of care and attention into that project.
I think this highlights a problem that has always existed under the surface, but it's being brought into the light by proliferation of vibeslop and openclaw and their ilk. Even in the beforetimes you could craft a 100.0% pure, correct looking github repo that had never stood the test of production. Even if you had a test suite that covers every branch and every instruction, without putting the code in production you aren't going to uncover all the things your test suite didn't--performance issues, security issues, unexpected user behavior, etc.
As an observer looking at this repo, I have no way to tell. It's got hundreds of tests, hundreds of commits, dozens of stars... how am I to know nobody has ever actually used it for anything?
I don't know how to solve this problem, but it seems like there's a pretty obvious tooling gap here. A very similar problem is something like "contributor reputation", i.e. the plague of drive-by AI generated PRs from people (or openclaws) you've never seen before. Stars and number of commits aren't good enough, we need more.
> I know full well that if you ask Claude Code to build a JSON API endpoint that runs a SQL query and outputs the results as JSON, it’s just going to do it right. It’s not going to mess that up. You have it add automated tests, you have it add documentation, you know it’s going to be good.
> But I’m not reviewing that code. And now I’ve got that feeling of guilt: if I haven’t reviewed the code, is it really responsible for me to use this in production?
Answer: it wholly depends upon what management has dictated be the goal for GenAI use at the time.
There seems to be a trend of people outside of engineering organizations thinking that the "iron triangle" of software (and really, all) engineering no longer holds. Fast, cheap, good: now we can pick all three, and there's no limit to the first one in particular. They don't see why you can't crank out 10x productivity. They've been financially incentivized to think that way, and really, they can't lose if they look at it from an "engineer headcount" standpoint. The outcomes are:
1) The GenAI-augmented engineer cranks out 10x productivity without any quality consequences down the line, and keeps them from having to pay other people
or
2) The GenAI-augmented engineer cranks out 10x productivity with quality consequences down the line, at which point the engineer has given another exhibit in the case as to why they should no longer be employed at that organization. Let the lawyers and market inertia deal with the big issues that exist beyond the 90-day fiscal reporting period.
Either way, they have a route to the destination of not paying engineers, and that's the end goal.
If you don't like that way of running a software engineering organization, well, you're not alone, but if nothing else, you could use GenAI to make working for yourself less risky.
I feel like an outlier in all of this. But isn't this just more AI slop? How is this different from text generation or image generation?
Like many people I have used AI to generate crap I really don't care about. I need an image. Generate something like, whatever. Great hey a good looking image! No that's done I can do something I find more interesting to do.
But it's slop. The image does not fit the context. Its just off. And you can tell that no one really cared.
Every time I do deep work, and think of solutions to a complex problem. I always have the opportunity to ask claude to implement a sub-par AI slop solution.
Do this enough times, and I will have forgotten how to think.
Or, you just explain the solution and save some typing and get the same thing. I find it refreshing to be able to just talk to Claude and have it generate the same thing I would have built.. It gives me more time to articulate and solve complex problems, and less time with the mundane writing, test loops etc.
> where you fully give in to the vibes, embrace exponentials, and forget that the code even exists [...] It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
So clearly we need a term for what happens when experienced, professional software engineers use LLM tooling as part of a responsible development process, taking full advantage of their existing expertise and with a goal to produce good, reliable software.
"Agentic engineering" is a good candidate for that.
> as part of a responsible development process, taking full advantage of their existing expertise and with a goal to produce good, reliable software
Its shifted so much for me. I used to think that I had a solemn duty to read every line and understand it, or to write all the test cases. Then I started noticing that tools like CodeRabbit, or Cursor would find things in my code that I would rarely find myself.
I think right now, its shifted my perception of my role to one where I am responsible for "tilting" the agentic coding loop; ultimately the goal is a matter of ensuring the agent learns from its mistakes, self-organize and embrace a spirit of Kaizen.
Btw thank you for your work on Django, last 20 years with it were life changing (I did .NET before).
For work I do agentic engineering. As the code that I submit for a code review is hand reviewed by me. I know every line and file that I submit.
My side project is 80% vibe code. Every now and then I look and see all the bad stuff, then I scold Codex a bit and it refactors it for me. So I do see the author's point.
Instead of "vibe coding" by asking the AI to design and write code, I'm having it refine my own designs, and write code under strict supervision and guidance, that I carefully review and iterate on.
I took a rock carving course in school that really enlightened me about software engineering, and it still applies today, especially to AI. You can't just decide what you want to carve, hold the chisel in just the right spot, and whack it with a hammer just perfectly so all the rock you want falls away leaving a perfect statue behind.
"I saw the angel in the marble and carved until I set him free." -Michelangelo
It's a long drawn out iterative process of making millions of tiny little chips, and letting the statue inside find its way out, in its natural form, instead of trying to impose a pre-determined form onto it.
Vibe coding is hoping your first whack of the hammer is going to make a good statue, then not even looking at the statue before shipping it!
But AI assisted conscientious coding (or agentic engineering as Simon calls it) is the opposite of that, where you chip away quickly and relentlessly, but you still have to carefully control where you chisel and what you carve away, and have an idea in your mind what you want before you start.
I agree, I'm actually generating just over of 20,000 lines of code each day at my company. Part of that was the mandate and leaderboards around token usage, but also they started using pull requests as an explicit metric. What I do is usually pull around 5 or so tickets at once, spin up 5 different agents on their own branch, have them work until completion, and then spin up two more agents to handle the merge request.
I'm not checking the code since the code doesn't really matter anymore anyways - I just have the agent write passing tests for the changes or additions I make, and so even if something breaks I can just point to the tests.
Some days, the tickets are completed much faster than I expect and I don't hit my daily token expenditure goal, so I have my own custom harness that actually hooks up an agent to TikTok, basically it splits up the reel into 1 second increments and then feeds those frames to the LLM for it's own consumption. I can easily burn 10m tokens a day on this, and Claude seems to enjoy it.
Personally I want to thank you Simon for putting me onto this "vibe engineering" concept, I really didn't expect an archaeology major like myself to become a real engineer but thanks to AI now I can be! Truly gatekeeping in tech is now dead.
I'd be lying if I said I was not worried about the future. I am not necessarily worried in the sense that there will be some grave, impeding doom that awaits the future of humanity.
Rather, I just feel like I have to constantly remind myself of the impermanence of all things. Like snow, from water come to water gone.
Perhaps I put too much of my identity in being a programmer. Sure, LLMs cannot replace most us in their current state, but what about 5 years, 10 years, ..., 50 years from now? I just cannot help be feel a sense of nihilism and existential dread.
Some might argue that we will always be needed, but I am not certain I want to be needed in such a way. Of course, no one is taking hand-coding away from me. I can hand-code all I want on my own time, but occupationally that may be difficult in the future. I have rambled enough, but all and all, I do not think I want to participate in this society anymore, but I do not know how to escape it either.
If you work in any new technology field, the chances that your job will exist in the same way 50 years from now is very small.
The job, as you have done it at least, was also not here 50 years before you started doing it.
Did you have any of the same feelings knowing that you were doing a job that has not existed in the world very long? That seems like a strange requirement for a meaningful job, that it should remain the same for 50+ years.
In truth, our world and what we do for our careers is entirely shaped by the time that we live in. Even people that ostensibly do the same thing people have done for centuries (farmer, teacher, etc) are very different today than 100 years ago.
> And that feels about right to me. I can plumb my house if I watch enough YouTube videos on plumbing. I would rather hire a plumber.
I don't buy this argument at all. I think if we could pay $20/month to a service that would send over a junior plumber/carpenter/electrician with an encyclopedic knowledge of the craft, did the right thing the majority of the time, and we could observe and direct them, we'd all sign up for that in a heartbeat. Worst case, you have to hire an experienced, expensive person to fix the mess. Yes, I can hear everyone now, "worst case is they burn your house down." Sure, but as we're reminded _constantly_ when we read stories about AI agent catastrophes -- a human could wipe your prod database too. wHy ArE yOu HoLdInG iT tO a DiFfErEnT sTaNdArD???
The business side of the house is getting to live that scenario out right now as far as software goes. Sure you've got years of expertise that an LLM doesn't have _yet_. What makes you think it can't replace that part of your job as well?
> I think if we could pay $20/month to a service that would send over a junior plumber/carpenter/electrician with an encyclopedic knowledge of the craft, did the right thing the majority of the time, and we could observe and direct them, we'd all sign up for that in a heartbeat.
I don’t think this comparison quite works (or maybe I think it works and is wrong) and I think it has something to do with creativity or the initial ideation.
I would do this, but I’m a jack of all trades. I built my own diner booth in my kitchen recently. But my wife, who loves the diner booth, just doesn’t really want to get over the hump of figuring out what she might want. I think most people want to offload the mental load of figuring out where to start.
Most people aren’t just bored by coding, they’re bored or overwhelmed by the idea of thinking about software in the first place. Same with plumbing or construction, most people aren’t hiring someone to direct, they’re hiring a director.
Even I have this about some things, sometimes I choose to outsource the full stack of something to give me more space to do creativity elsewhere.
You're comparing paying $20 for an AI plumber to paying hundreds/thousands for a traditional plumber.
But that's not what the author is talking about in that passage you quoted. What he's saying is that, if you can pay $20 for an AI plumber, then it stands to reason that eventually you will be able to pay $30 to a company that manages AI plumbers for you, so that you don't even have to go to the trouble of supervising the plumber. Most people will choose the $30.
It's in a section called "Why I’m still not afraid for my career."
The implication here is software engineer jobs are still safe despite basically free labor/material being available to do said jobs because he thinks other people would prefer to pay experienced professionals to do it right at a significantly higher cost. My point is, I think most people will take the low-stakes gamble of having the cheap AI agent do it with self-supervision[0]. He's naive in thinking people are really going to care about artisanal software built by experienced professionals in the future.
0: Even if you subscribe to the "your job will be to supervise the agents" train of thought, you're kinda glossing over the fact that it's probably gonna involve a pretty significant pay cut and the looming problem of "how do new experienced professionals get created if they don't have to/don't need to get their hands dirty"?
It is pure arrogance to expect that machines will never be able to code as good as a skilled human.
And AI generated code should be different than human code. AI has infinite memory for details. AI doesn’t need organizational patterns like classes. Potentially AI can write code that is more performant than any human.
Will it look like garbage? Sure. Will the code be more suited to the task? Yes.
I find it hard to believe that code with unnecessary cruft and repetition is "more suited to the task". I've literally deleted hundreds of unnecessary or unused functions at this point. The only way I can agree is if "more suited" means, "it's wearing multiple suits for no reason".
What will happen when AI companies increase the price of tokens?
The code produced will only be understandable by AI. You could use locally hosted LLMs, but it won't be as performant as AI run by big guys. And there is nothing stopping greedy companies implementing some ridiculous pattern that only their model can reasonably work with.
So what you'll do in situation when you can't understand "your" codebase and you have to make changes or fix a bug?
The open weight models are nipping on the heels of frontier models. The frontier labs have to make forward progress and keep tokens cheap in order to maintain marketshare.
Eventually, we'll have a Mythos-level model running on integrated hardware on every PC.
Your post weeks of pure arrogance. You sound like the bozo’s at Anthropic who made an AI agent for finance and think this is somehow going to provide a huge productivity boost because all they do is a bunch of tick boxing and spreadsheet work.
Repeat after me: most software spends the majority of its lifetime in the maintenance phase.
Repeat after me: it follows that most of the money the software makes occurs during the maintenance phase.
Repeat after me: our industry still does not understand this after almost 100 years of being in existence.
Alan Kay was 100% right when he said that the computer revolution hasn't occurred yet. For all of our current advancements all tools are more or less in the Stone Age.
My great hope is that AI will actually accelerate us to a point where the existing paradigm fully breaks beyond healing and we can finally do something new, different, and better.
So for now - squeee! - put a jetpack on your SDLC with AI and go to town!!! Move fast and break things (like, for real).
People in the future are going to wonder what the hell we were thinking, when 30 years down the line everything is a hot mess of billions of lines of code generated by LLMs that no human has read almost any of it and is no longer possible for anyone to maintain neither with nor without LLMs. And the LLM generated garbage will have drowned out all of the good quality code that ever existed and no one will be able to find even human generated code anymore on the internet.
Makes me want to just give up programming forever and never use a computer again.
30 years down the line a human will wake up in his climate controlled bed in an idyllic large scale people-zoo, think about what information he wants, and immediately his 900TB ferroelectric compute-in-memory exobrain will read his thoughts via his brain-computer-interface, and render a custom 3d visualization of that information floating in front of him. There will be no separate code stage, just neural rendering of data to pixels.
But are the pixels hot?
Better not think a forbidden thought. Oh shoot! You just did! :)
Well thanks, I lost the game now :|
Who empties the bedpan?
I think it’s a mistake to think that we will be blindly going in this direction for many years and then suddenly collectively wake up and realize what have we done. It’s a great filter and a great opportunity.
If LLMs stop improving at the pace of the last few years (I believe they already are slowing down) then they will still manage to crank out billions lines of code which they themselves won’t be able to grep and reason through, leading to drop in quality and lost revenue for the companies that choose to go all-in with LLMs.
But let’s be realistic - modern LLMs are still a great and useful tool when used properly so they will stay. Our goal will be to keep them on track and reduce the negative impact of hallucinations.
As a result software industry will move away from large complex interconnected systems that have millions of features but only a few of them actively used, to small high quality targeted tools. Because their work will be easier to verify and to control the side effects.
> If LLMs stop improving at the pace of the last few years (I believe they already are slowing down)
Depending on how you measure "improvement" they already have or they never will :-/
Measuring capability of the model as a ratio of context length, you reach the limits at around 300k-400k tokens of context; after that you have diminishing returns. We passed this point.
Measuring capability purely by output, smarter harnesses in the future may unlock even more improvements in outputs; basically a twist on the "Sufficiently Smart Compiler" (https://wiki.c2.com/?SufficientlySmartCompiler=)
That's the two extremes but there's more on the spectrum in between.
300k-400k isn’t the current limit if you create modules and/or organize the code reasonably.. for the same reason we do this for humans: it allows us to interact with a component without loading the internals into out context.
you can also execute larger tasks than this using subagents to divide the work so each segment doesn’t exceed the usable context window. i regular execute tasks that require hundreds of subagents, for example.
in practice the context window is effectively unlimited or at least exceptionally high — 100m+ tokens. it just requires you to structure the work so it can be done effectively — not so dissimilar to what you would do for a person
That makes it not a context window.
How to organize code like you said, and how agents interact with it, to keep the actual context window small is the fundamental challenge.
I keep getting surprised that people who are all-in on this (" i regular execute tasks that require hundreds of subagents ") don't have any idea of what is happening even a single layer below their interface to the LLM ("in practice the context window is effectively unlimited or at least exceptionally high — 100m+ tokens.")
I looked at that response by GP (rgbrenner) and refrained from replying because if someone is both running hundreds of agents at a time AND oblivious to what "context window" means, there is no possible sane discourse that would result from any engagement.
I wish I got to hallucinate at work, and just get a pat on the head for constantly doing the wrong thing.
Maybe I am unlucky but I had worked with too many developers who couldn't make a good decision if their life depended on it. LLMs at least know how to convince you of their decisions with strong arguments.
I mean you can do that, but the job probably doesn't pay too much. Might enrich your spirituality though.
First, most software is already a hot mess.
Second, LLM code can be less of a hot mess than human written code if you put in the time to train/prompt/verify/review.
Generating perfect well patterned SOLID and unit tested code with no warnings or anti-patterns has never been easier.
The only people who are going to put in the time, are people who care enough to. The problem is you have people who didn’t care before who were equipped with a garden hose. Now that they have a fully pressurized fire hose they can make more of a mess faster.
Then they should be easy to defeat. Why are you complaining?
As an author of fine literature, these million monkeys on typewriters simply upset my sense of dignity. And to imagine the impoverished prose so many readers shalt forthwith be perusing!
Defeat in what aspect?
Compete with, for jobs, customers, investment, etc.
Maybe. But it depends on the metric. It seems like orgs are focused on PR count and token usage. Issues caused by poor code are often lagging indicators so it’s asymmetrical in that aspect.
Write lots of code now and statistically look great, while the impact won’t be felt for a much larger range of time.
With the job search and whatnot then yeah, caring becomes a lot more important. That’s true.
This is so on point that I want to cry.
Hard disagree. LLMs are fantastic for fixing bad architecture that's been around for a decade because nobody was willing to touch it. I can have it write tons and tons of sanity checks and then have it rewrite functionality piece by piece with far more verification than what I'd get from most engineers.
It's not immediate, it still takes weeks if you want to actually do QA and roll out to prod, but it's definitely better than the pre-LLM alternatives.
Yeah but you care which is my exact point.
How is this different from every single technological iteration?
Like with a lot of things in this space, it depends where you invest your effort. If you care about quality design and good code, you can definitely get there - but that doesn't happen by default.
With the right investment, we could certainly have tooling that creates and maintains very good designs out of the box. My bet is that we'll continue chasing quick and hacky code, mostly because that's the majority of the code that it was trained on, and because the majority of people seem to be interested in a quick result vs a long-term maintainable one.
>First, most software is already a hot mess.
That the industry was already routinely dealing with fires of it's own creation is not a valid reason to start cooking with gasoline.
But we aren't cooking with gas. We are cooking with a more controlled burner than ever that can download a clean code claude skill and be committing better code than you or I could write.
What would normally be considered overengineered gold plating is "free" now.
Right, but it takes one to know one. Many don’t have the ability to decipher what’s good stable output or not
I can't get used to vibe-coded projects on Github. One that I was using for a little while is about a year old, with 40,000 commits and 15,000 PRs. And it has "lite" in its name; it's supposed to be the simple alternative. There were so many bugs. I fixed one, submitted a PR, but it was off the first page in hours. It will never be merged. I moved to a different project with a bit less... velocity, and it has been way smoother.
By then, the fix will be easy. Fire up the latest LLM, point it at your codebase and tell it "rewrite this from scratch. do it well. fix the architecture mistakes"
There is definitely going to be some Wirth's law-like [0] effect about the asymmetry of software complexity outpacing LLMs' abilities to untangle said software. Claude 9.2 Optimus Prime might be able to wrangle 1M LoC, but somehow YC 2035 will have some Series A startup with 1B+ LoC in prod — we'll always have software companies teetering on the very edge of unmaintainability.
[0] https://en.wikipedia.org/wiki/Wirth%27s_law
It won't be an LLM that does it, the entire feature of an LLM is it produces generalizable reasonably "correct" text in response to a context.
The system that makes it have an opinion about good vs bad architecture or engineering sensibilities will be something on top of the transformer and probably something more deterministic than a prompt.
We can do this today too (but definitely hopefully future LLMs make better architectural decisions). With Claude, I've been working on an application for the last 2 months. I didn't have a great vision of what I wanted when I started but I didn't want that to slow me down. The architecture is terrible - Claude separated some functionality into different classes but did a bad job at it and created a big ball of mud. Now that I finally have my vision locked down and implemented (albeit poorly), it'd be a great time to throw it away and start over. It'd be interesting to see the result and see how long it takes.
Just have claude (or gpt maybe) do an architecture review and request a multi-phase refactoring plan. This is probably better to do incrementally as you notice the balls of mud forming but it might not be too late. Either way, if it does something you don't like, `git checkout` and start over
Will work just as good as today or 20 years ago.
Are you suggesting AI coding was as good 20 years ago as it is today?
I think they're being sarcastic, saying that rewrites from scratch have rarely worked well (whether done by AI or humans).
It sure wrote less crappy code.
"Write me a really cool game, that will make me lots of money, fast!"
Make me a 1hr episode of my favorite book. Make it as lore accurate as possible. Plot out the script for the next 100 episodes.
I see your point, however: EA sports has been doing this for literally the entire lifetime of gaming as an industry
Electronic Sharts slogans and franchises:
"Shit's in the Game!"
"Chunder Everything"
"Maddening NFL 26"
"FIFiAsco 26"
"UFC 26 (Un Finished Code)"
"The Shits 4"
"Battlefailed"
"Need for Greed"
Do you think new LLMs are going to write better and better code? When all they are going to have is the slop generated by previous, worse models?
"Make sure to double check everything, and MAKE NO MISTAKES!!!"
Don't hallucinate!
"YOU'RE A SENIOR SOFTWARE ENGINEER!!!"
"Ultrathink!"
People, as a rule, don't really "go backwards." We didn't really walk back on the industrial revolution, and we're probably not going to walk back from this day-and-age's activities. It's only unsettling until the changes are accepted. The old timers can vie for a time before "all this" when they were children and all their needs were given to them by their now-deceased parents, and the cycle can continue on, yet again.
I'm generally pro "llm assisted coding" or whatever you want to call it. But I do somethings think about the Butlerian Jihad from Dune.
https://en.wikipedia.org/wiki/Dune:_The_Butlerian_Jihad
If you like sci-fi takes on software systems, check out Vernor Vinge "A Fire upon the deep" and sequels. I recall ship systems software is something like all the code humanity has ever written, plus centuries of LLM churn. One of the protagonists is a space faring software developer particularly good with legacy code.
We are used to thinking about software like in the article, a program that runs deterministically in an OS. Where we are headed might be more like where the LLM or AI system is the OS, and accomplishes things we want through a combination of pre-written legacy software, and perhaps able to accomplish new things on the fly.
Ordered Fire Upon the Deep. Looks interesting.
Interesting, I kinda do this. Sometimes when an LLM solves a problem for me, I have it write code so that I can reuse that exact same approach deterministically(and I line by line check it). Now I have about a dozen CLI commands that the LLM can use and I'm reasonably (although not 100%) sure I'll get an expected outcome. Really helpful with debugging via steam pipe and connecting to read replicas.
Sounds like a recipe for Star Trek holodeck malfunctions.
Pham Nuwen is a master of vibe patching legacy sedimentary software.
If 30 years down the line I still have to look at code, maintain code, or even worry in the slightest about code, something went deeply wrong.
Code will never go away. Code was there before computer hardware and it will always be there. Code is (almost?) all of computation theory so unless we throw computers away, we shall always use code.
They're not suggesting that code will go away, but rather that it will be abstracted beneath an LLM interface, so that writing code in the future will be like writing assembly today: some people do it for fun or niche reasons, but otherwise it's not necessary, and most developers can't do it.
Whether that happens or not is a different question, but I believe that's what they're suggesting.
Code is formal and there are basic axioms that grounds its semantic. You can build great constructs on top of those semantics, but you can’t strip away their formality without the whole thing being meaningless. And if you can formalize a statement well enough to remove all ambiguity, then it will turn into code.
Programming is taking ambiguous specs and turning them into formal programs. It’s clerical work, taking each terms of the specs and each statements, ensuring that they have a single definition and then write that definition with a programming language. The hard work here is finding that definition and ensuring that it’s singular across the specs.
Software Engineering is ensuring that programming is sustainable. Specs rarely stay static and are often full of unknowns. So you research those unknowns and try to keep the cost of changing the code (to match the new version of the specs) low. The former is where I spend the majority of my time. The latter is why I write code that not necessary right now or in a way that doesn’t matter to the computer so that I can be flexible in the future.
While both activities are closely related, they’re not the same. Using LLM to formalize statements is gambling. And if your statement is already formal, what you want is a DSL or a library. Using LLM for research can help, but mostly as a stepping stone for the real research (to eliminate hallucinations).
> is no longer possible for anyone to maintain neither with nor without LLMs.
That's what the Tech-Priests are for.
<INTERROGATIVE-HAVE YOU TRIED APPLYING INCENSE AND RECITING THE SACRED TECH LITANIES?>
Why are we pretending everyone's code is an etalon of quality? Most software out there is probably hot mess already. No think behind it, let alone ultrathink.
Exactly, before the rise of LLMs it was not at all uncommon to hear people claiming that their job is to just Google API calls or copy and paste code from Stackoverflow. The context back then was that companies are being picky by hiring people who can demonstrate some modicum of understanding of data structures and algorithms because all any developer does is tweak some CSS or make some calls to a database to glue together a CRUD app... why should anyone be expected to know how to reverse a linked list, or how a basic sorting algorithm works... just download an npm package to do that stuff and glue it all together with a series of nested for loops.
With the rise of LLMs that do all of that... those people shutup and shutup real fast.
Hello from assembly programmers to present day javascript folks. Joke aside, I sometimes think how VS Code is written in such layers and layers of code - ~200mb of minified code - Java based IDEs were worser with almost 1GB of code (libs/dependencies). And VS Code did beat native editors (Sublime) of its time to dominate now - may be because of the business model (open & free vs freemium). But it does the job quite well IMO. And it enabled swarms of startups to go to market including billion $ wrappers - including Cursor, Antigravity and almost all UI coding agents. I remember backend developers (Java/C++ type) looking down upon Javascript developers as if we are from an inferior planet or something.
How many of us remember that VSCode is actually a browser wrapped inside a native frame?
To be fair, MS send a world class engineer to make JavaScript usable for codebases at that scale.
>How many of us remember that VSCode is actually a browser wrapped inside a native frame?
The new standard, Web Apps. Why update 3 seperate binaries for Win/Lin/Mac when you can do 1 for a web framework and call it a day?
VS Code has two things that worked well for it. Web Tech and Money. Web tech makes it easy to write plugins (you already know the stack vs learning python for sublime). And I wonder how much traction it would get if not Microsoft paying devs to wrangle Electron in a usable shape.
Have you ever encountered the very common real life situation where there's some software that works, and you have a binary for it but you either don't have the source code or it doesn't compile for whatever reason? This is the pre-LLM world. Now, do you think LLMs make this situation better or worse? You may not know what's wrong with your software or how to fix it, but unlike in the past you can throw compute at trying to figure it out, or replicating a subset of it, or even replicating all of it depending on what it is. I think LLMs are making this situation better not worse.
I think the problem with that sort of thought is that the burgeoning sizes of output for even trivial software makes it almost a certainty that:
a) The stuff output by the existing LLMs is too unwieldy even for them to handle , even if the product itself is a glorified chatbot.
b) If all software is throwaway, then the value of all software drops to, effectively, the price of an AI subscription. We'll all be drowning in a market of lemons (https://en.wikipedia.org/wiki/The_Market_for_Lemons), whilst also being producers in said market.
another aspect is amount of code LLMs can handle went from few lines to small codebase in few years, so future is just possible for a lot bigger codebases?
Why does it matter, as long as it accomplishes the task?
There is nothing in the post to support the statement. An interesting personal confession, but it does not establish that vibe coding and agentic engineering are converging as a general phenomenon.
As a piece of meat, I look forward to charge rates of $10,000 an hour, to fix code out the vibe code generation.
If that is the case market forces would likely favor hand written code and all the slop will be forgotten (unless the slop works fine and is stable).
The market is hardly as rational as people would like to hope it is, though it does at least have its own twisted sort of internal consistency.
I don't think that's how money works. Enough people have poured enough money into this thing that the actual, measurable results/efficacy/ROI are of secondary importance (to put it mildly). At this point AI adoption is (at least sold as) a fait accompli.
This is wishful thinking. The force of the market is "number go up". Quality increasingly has less and less of a role in the equation. You will eat your slop, and you will like it. It will be the only choice you have.
But the quality of code was already very bad due to market forces. Most code at large companies is notoriously poor despite the talent density, because the incentives are not there to tackle tech debt or improve code quality.
With such a low baseline, there is an optimistic perspective that LLMs could improve the situation. LLMs can produce excellent code when prompted or reviewed well. Unlike human employees, the model does not worry about getting a 'partially meets expectations' rating or avoid the drudgery of cleaning up other people's code.
The model is optimized in a different way to "partially meet expectations". Sycophancy coupled with only really "knowing" what it has been trained on assure a different kind of mediocrity.
The same incentives that discourage good code in pre-AI times are still dominating now. You will be pushed to ship sub-par products in the future, just like you were in the past.
AI certainly has the potential to make the underlying code/design a lot cleaner. We will also be working with dramatically more code, at a much higher rate of change. That alone will be a big challenge to keep sustainable.
The ones making the decision to under-invest on design are either are unaware of the real costs, or are aware and are deliberately choosing that path - that's not new, and I don't expect it to change.
I agree generally but there are periods where creative people show up and a whole slew of existing firms go bust/shrink due to one’s ability to envision a path toward creative destruction.
Have you seen Windows? We already have thirty years of slop.
> People in the future are going to wonder what the hell we were thinking, when 30 years down the line everything is a hot mess of billions of lines of code generated by LLMs that no human has read
--
It's just as likely that people will be surprised that we used to have billions of lines of human generated code, that no LLM ever approved.
By then AI would be good enough to clean them all up....like I dont get these dooming scenarios they always assume that we are going to be stuck with LLMs and there wont be anything new coming.
To make my comment more on-topic: why do you think this is going to be the case? What newer LLMs will be trained on?
well you are assuming that there's not going to be any new progress and that we are going to be stuck with whatever LLM version we have currently
> Makes me want to just give up programming forever and never use a computer again.
LLMs aren’t the first thing to come along and change how people develop applications.
You had the rise of frameworks like Django, Rails, etc. Also the rise of SPAs. And also the rise of JS as a frontend+backend language.
In a 3-5 yeats we’ll have adapted to the new norm like we have in the past
The difference between writing assembly code and Ruby code is much smaller than the difference between programming and vibe coding.
Also, companies are pressuring employees towards adoption in novel ways. There was no such industry-wide pressure by employers in the 90s, 2000s or 2010s for engineers to use a specific tech.
> Also, companies are pressuring employees towards adoption in novel ways. There was no such industry-wide pressure by employers in the 90s, 2000s or 2010s for engineers to use a specific tech.
Companies have been enforcing technology mandates since time immemorial. In the early 2000s there were definitely a lot of mandates to move away from commercial UNIX to Linux. Lots of companies began enforcing the switch to PHP, Ruby and Python for new projects.
Or, it could be like asbestos and the immediate benefits are just too appealing to listen to arguments of skeptical naysayers about some vaguely defined problems that are decades away, if they even happen.
I use AI tools daily (because they feel like they're helping me) but it's not exactly hard to imagine scenarios where an explosion of slop piling up plus harm to learning by outsourcing all thinking results in systemic damage that actually slows the pace of technological progress given enough time.
History of new technologies tend to average into a positive trend over a long enough time scale but that doesn't mean there aren't individual ups and downs. Including WTF moments looking back at what now seems like baffling decision-making with benefit of hindsight.
Some of us are already experiencing that. For example I handed off an initial version of something some months ago, and the AI-generated stuff they came up with was a huge buggy mess of spaghetti code neither of us understood. Months later we've detangled it, cutting it down to a third the size, making it far simpler to understand, and fixing several bugs in the process (one was even by accident, we'd made note of it, then later when we went to fix it, it was already fixed).
> Or, it could be like asbestos
If it is, the fall out will be way worse than if AI ends up living up to (reasonable) expectations.
If it doesn’t, we are going to see over a trillion dollars of capital leave the tech sector, which I think will have worse impacts on the livelihood of tech workers than if AI ends up panning out.
This is something the naysayers need to grapple with. We’ve crossed a line where this tech needs to work simply because of the amount of money depending on that fact.
The asbestos hypothetical is a bit different than the "bubble popping" economic crisis scenario though. In this world, AI would just continue being adopted and shoved into every nook and cranny into which it can be made to fit, with valuations only getting bigger and bigger.
The damage would come much later, well beyond the point where it could be simply pulled out and replaced without spending massive amounts of money and would also basically necessitate training an entire new generation of engineers.
Then the AI giants would start appearing vulnerable like cigarette companies in the 90s while an AI Superfund and interstate class action are being planned but Sam Altman would already be a centitrillionaire at that point so it would be someone else's problem.
Have you ever worked on a legacy codebase with actual good code? I struggle to see the difference between your predicted future and today's reality when it comes to working with legacy disasters.
Well, on legacy code base, you still needed humans to write those lines of code. There's a maximal amount of lines a human can write in a year.
Now with LLM we are talking of millions and millions of line of code that could be generated in a single day. The scale of the problem might not be the same at all.
Vibe Coding (and LLMs) did not create undisciplined engineering organizations or engineers. They exposed and accelerated them.
Plenty of engineers have loose (or no!) standards and practices over how they write coee. Similarly, plenty of engineering teams have weak and loose standards over how code gets pushed to production. This concept isn't new, it's just a lot easier for individuals and teams who have never really adhered to any sort of standards in their SDLC to produce a lot more code and flesh out ideas.
Bad engineers continue being bad, good engineers continue being good.
I personally don’t know any colleagues who were good engineers just because they wrote code faster. The best engineers I know were ones who drew on experience and careful consideration and shared critical insights with their team that steered the direction of the system positively.
> Claude, engineer a system for me, but do it good. Thanks!
> I personally don’t know any colleagues who were good engineers just because they wrote code faster
Same, if anything, the opposite seems to be true, the ones that I'd call "good engineers" were slower, less panicked when production was down and could reason their way (slowly) through pretty much anything thrown at them.
Opposite experience, I've sit next to developers who are trying their fastest to restore production and then making more mistakes to make it even worse, or developers who rush through the first implementation idea they had for a feature, missing to consider so many things and so on.
> Same, if anything, the opposite seems to be true, the ones that I'd call "good engineers" were slower
Unfortunately, a lot of workplaces are ignoring this, believing their engineers are assembly line workers, and the ones who complete 10 widgets per minute are simply better than the ones who complete 5 widgets per minute.
It isn't just that they believe this - they want a business model where this is how it works. For a big company a star coder is a liability - they have strong labor power, they can leave and they are hard to replace, etc.
Companies want workflows that work with mediocre programmers because they are more like interchangeable parts. This is the real secret to why AI programming will work in a lot of places. If you look at the externalities of employing talented people, shitty code actually looks better than great code.
To these kinds of companies, what's even better than a rack of mediocre programmers? AI agents that you can just conjure up and prompt. They take up no facility space, don't require lunch breaks or vacations, obey all commands and direction, and produce a predictable and consistent amount of output per dollar.
This is the earworm the leaders of these companies have allowed into their minds. Like Agent Mulder, they Want To Believe in this so badly...
Glad I find myself employed under a division called Research and Development. Poaching and retaining highly compensated individuals is the entire purpose.
> I personally don’t know any colleagues who were good engineers just because they wrote code faster.
However, the best engineers I know are usually among the quickest to open an editor or debugger and use it fluently to try something out. It's precisely that speed that enables a process like "let's try X, hmm, how about Y, no... ok, Z is nice; ok team, here are the tradeoffs...". Then they remember their experience with X, Y, and Z, and use it to shape their thinking going forward.
Meanwhile, other engineers have gotten X to finally mostly work and are invested in shipping it because they just want to be done. In my experience, this is how a lot of coding agents seem to act.
It's not obvious to me how to apply the expert loop to agentic coding. Of course you can ask your agent to try several different things and pick the best, or ask it to recommend architectural improvements that would make a given change easier...
Or: depth-first search of the solution space vs breadth-first (or balanced) search of the solution space.
> Of course you can ask your agent to try several different things and pick the best, or ask it to recommend architectural improvements that would make a given change easier
The ideal solution increasingly seems to be encoding everything that differentiates a good engineer from a bad engineer into your prompt.
But at that point the LLM isn’t really the model as much as the medium. And I have some doubts that LLMs are the ideal medium for encoding expertise.
> However, the best engineers I know are usually among the quickest to open an editor or debugger and use it fluently to try something out
The Pragmatic Programmer book has whole chapters about this. Ultimately, you either solve the problem analogously (whiteboard, deep thinking on a sofa). Or you got fast as trying out stuff AND keeping the good bits.
Yeah, a lot of people came of age with a "we'll fix it when it's a problem" mindset. Previously their codebases would start to resist feature development, you'd fix the immediate bottlenecks, and then you could kick the can down the road a bit until you hit the next point of resistance. You kinda refactor as you do features. The frontier models have pushed the "it's a problem" moment further back. They can kinda work with whatever pile of code you give them... to a point. So it manifests as the LLM introducing extra regressions, or dropping more requirements than it used to, but it's not really manifesting as the job being harder for you. It's just not as smooth as it was from an empty repository. Then you hit the point where it just breaks too much and you need to fix it. And the whole codebase is just fractal layers of decisions that you didn't make. That's hard to untangle. And you're not editing the code yourself, so you don't have that visceral "adding this specific thing in this specific way has a lot of tension" reaction that allows you to have those refactoring breakthroughs.
Vibe coded apps with barely no tests, invariants, etc. No wonder it turns into spaghetti. You can always refactor code, force agents to write small modular pieces and files. Good engineering is good engineering whether an agent or human wrote the code. Take time to force agents to refactor, explore choices. Humans must at least understand and drive architecture at this point still. Agents can help and do recon amazingly and provide suggestions.
For me the distinction is the quality and rigor of your pipeline.
Vibe coding: one shot or few shot, smoke test the output, use it until it breaks (or doesn't). Ideal for lightweight PoC and low stakes individual, family or small team apps.
Agentic engineering: - You care about a larger subset of concerns such as functional correctness, performance, infrastructure, resilience/availability, scalability and maintainability. - You have a multi-step pipeline for managing the flow of work - Stages might be project intake, project selection, project specification, epic decomposition, d=story decomposition, coding, documentation and deployment. - Each stage will have some combination of deterministic quality gates (tests must pass, performance must hit a benchmark) and adversarial reviews (business value of proposed project, comprehensiveness of spec, elegance of code, rigor and simplicity of ubiquitous language, etc)
And it's a slider. Sometimes I throw a ticket into my system because I don't want to have to do an interview and burn tokens on three rounds of adversarial reviews, estimating potential value and then detailed specification and adversarial reviews just to ship a feature.
If your slider only goes between vibe coding or agentic engineering you're missing an entire range of engineering where the human is more involved.
I've been using Opus, GPT-5.5, and some lesser models on a daily basis, but not having them handle entire tasks for me. Even when I go to significant effort to define and refine specs, they still do a lot of dumb things that I wouldn't allow through human PR review.
It would be really easy to just let it all slide into the codebase if I trusted their output or had built some big agentic pipeline that gave me a false sense of security.
Maybe 10 years from now the situation will be improved, but at the current point in time I think vibe coding and these agentic engineering pipelines are just variations of a same theme of abdicating entirely to the LLM.
This morning I was working on a single file where I thought I could have Opus on Max handle some changes. It was making mistakes or missing things on almost every turn that I had to correct. The code it was proposing would have mostly worked, but was too complicated and regressed some obvious simplifications that I had already coded by hand. Multiply this across thousands of agentic commits and codebases get really bad.
I agree, vibe coding does not have quality gate checks at each stage, while agentic engineering does. Dev teams get into trouble when they try build to build without a proper process of design, tests, and reviews. This was true before agentic coding, but it's especially true now. The teams that understand how to leverage agents in this process are the ones that will be most successful.
Perhaps I've missed a few weeks worth of progress, but I don't think that AIs have become more trustworthy, the errors are just more subtle.
If the code doesn't compile, that's easy to spot. If the code compiles but doesn't work, that's still somewhat easy to spot.
If the code compiles and works, but it does the wrong thing in some edge case, or has a security vulnerability, or introduces tech debt or dubious architectural decisions, that's harder to spot but doesn't reduce the review burden whatsoever.
If anything, "truthy" code is more mentally taxing to review than just obviously bad code.
I know there are good uses of LLMs out there. I do. But.
The current fever pitch mandates from above seem to want it applied liberally, and pushing back against that is so discouraging and often career-limiting as to wear the fabric of one's psyche threadbare. With all the obvious problems being pointed out to people, there are just as many workarounds; and these workarounds, as is often revealed shortly thereafter, have their own problems, which beget new solutions, ad infinitum.
At some point it genuinely seems like all this work is for the sake of the machine itself. I suppose that is true: The real goal has become obscured at so many firms today, that all that remains is the LLM. Are the people betting the farm and helping implement the visions of those who have done so guaranteed a soft exit to cushion them from the consequences, or is rationality really being discarded altogether?
Sure, sound engineering principles can help work around these problems, but what efficiency is truly gained, in terms of cognitive load, developer time, money, or finite resources? Or were those ever an earnest concern?
In my opinion you are just wrong.
It’s an absolute game changer, and it can now multiply your productivity fivefold if it’s a solo greenfield project.
Maybe half a year ago it was as you said. You had to wait for the agent to finish, you had to review carefully, and often the result was not that great. You did not save a lot of time.
Now I can spin up 3+ parallel conversations in Codex, each in a git worktree. My work is mainly QA testing the features, refining the behavior, and sometimes making architectural decisions.
The results are now undeniable. In the past I could not have developed a product of that scope in my free time.
That is what is possible today. I suspect many engineers have not yet tried things that became feasible over the last months. Like parallel agents, resolving merge conflicts, separating out functionality from a large branch into proper PRs.
There's two sides to the AI mandates.
The degenerate side is clueless upper management and fad-driven engineering. We have talked extensively about this.
There is a more rational side to it that I've seen in my org: some engineers absolutely refuse to use AI and as a consequence they are now, clearly and objectively, much less productive than other engineers. The thing is, you still need to learn how to use the tool, so a nontrivial percentage of obstinate engineers need to be driven to use this in the same way that some developers have refused to use Docker or k8s or whatever.
Ah yes, we must force these obstinate engineers to the right path! Only after getting everyone to see the light will they understand and thank us for boundless productivity!! /s
Perhaps these “obstinate” engineers have good reason in their decision. And it should be their decision!
To be so confident in what is “the right way (TM)” and try to force it onto others is... revealing.
This has generally been the case, though. As mentioned in the post, "You want solutions that are proven to work before you take a risk on them" remains true and will be place where the edges are found.
It's about responsibility.
If I get pwned because my AI agent wrote code that had a security vulnerability, none of my users are going to accept the excuse that I used AI and it's a brave new world. I will get the blame, not Anthropic or OpenAI or Google but me.
The same goes for if my AI generated code leads to data loss, or downtime, or if uses too many resources, or it doesn't scale, or it gives out error messages like candy.
The buck stops with me and therefore I have to read the code, line-by-line, carefully.
It's not even a formality. I constantly find issues with AI generated code. These things are lazy and often just stub out code instead of making a sober determination of whether the functionality can be stubbed out or not.
You could say "just AI harder and get the AI to do the review", and I do this a lot, but reviewing is not a neutral activity. A review itself can be harmful if it flags spurious issues where the fix creates new problems. So I still have to go through the AI generated review issue-by-issue and weed out any harmful criticism.
> It’s not just the downstream stuff, it’s the upstream stuff as well. I saw a great talk by Jenny Wen, who’s the design leader at Anthropic, where she said we have all of these design processes that are based around the idea that you need to get the design right—because if you hand it off to the engineers and they spend three months building the wrong thing, that’s catastrophic.
This is spot on. I think the tooling is evolving so much particularly on the design side that its not worth the "translation cost" to stay (or even be) on the Figma side anymore.
What an excellent article by a smart, humble, still-learning person!
Favorite quote:" There are a whole bunch of reasons I’m not scared that my career as a software engineer is over now that computers can write their own code, partly because these things are amplifiers of existing experience. If you know what you’re doing, you can run so much faster with them. [...]
I’m constantly reminded as I work with these tools how hard the thing that we do is. Producing software is a ferociously difficult thing to do. And you could give me all of the AI tools in the world and what we’re trying to achieve here is still really difficult. [...]"
> If you can go from producing 200 lines of code a day to 2,000 lines of code a day, what else breaks? The entire software development lifecycle was, it turns out, designed around the idea that it takes a day to produce a few hundred lines of code. And now it doesn’t.
It is so embarrassing that LOC is being used as a metric for engineering output.
LOC is useful here not because it's a metric for output but because it's a metric for _understandability_. Reviewing 200 lines is a very different workload than reviewing 2000.
That's assuming the 200 lines are logical and consistent. Many of my most frustrating LLM bugs are caused by things that look right and are even supported by lengthy comments explaining their (incorrect) reasoning.
Ok? No one is saying that all LOC are equal. Ceteris paribus, 2000 lines is 10x more time consuming to review than 200
The point is that LOC is never a good metric for any aspect of determining the quality of code or the coder because it ignores the nuance of reality. It's impossible to generalize because the code can be either deceptively dense or unnecessarily bloated. The only thing that actually matters is whether the business objective is achieved without any unintended side effects.
> The only thing that actually matters is whether the business objective is achieved without any unintended side effects.
Objectives change; timeliness matters. The speed at which you deliver value is incredibly important, which is why it matters to measure your process. Deceptively dense is what I’d call software engineers who can’t accept that the process is actually generalizable to a degree and that lines of code are one of the few tangible things that can be used as a metric. Can you deliver value without lines of code?
> Objectives change; timeliness matters. The speed at which you deliver value is incredibly important, which is why it matters to measure your process.
This assumes that shorter code is faster to write. To quote Blaise Pascal, "I would have written a shorter letter, but I did not have the time."
> Can you deliver value without lines of code?
No, but you can also depreciate value when you stuff a codebase full of bloated, bug-ridden code that no man or machine can hope to understand.
You seem determined to misinterpret. I’m not talking about LOC as a measure of productivity. The ratio of LOC needing review to the capacity of reviewers (using how many LOC can be read/reviewed over the sampling period) is what’s being discussed. Agentic AI/vibe coding has caused that ratio to increase and shows a bottleneck in the SDLC. It’s a proxy metric, get over yourself.
“All models are wrong, some are useful”. What’s not useful is constantly bitching about how there’s no way to measure your work outside of the binary “is it done” every time process efficiency is brought up.
Yes, reading this back, I definitely veered off-topic. I apologize. I still don't think that you can say how much time it will take to review code based on how many lines of code are involved, but my argument was not well crafted. I just hope that others can learn something from our discussion. Thank you for being patient with me, and I hope you have a good day! :)
> 2000 lines is 10x more time consuming to review than 200
Very far from the truth in practice, every line of code isn't as difficult/easy to review as the other.
But why would the lines in the 2000 case be easier to review per line?
Holy shit, read the words I wrote. Ceteris Paribus. Assume the 200 lines and 2000 lines have a similar distribution of complexity.
It’s still a bad metric.
I have worked with code where 1000s of lines are very straightforward and linear.
I’ve worked on code where 100 lines is crucial and very domain specific. It can be exceptionally clean and well-commented and it still takes days to unpack.
The skills and effort required to review and understand those situations are quite different.
One is like distance driving a boring highway in the Midwest: don’t get drowsy, avoid veering into the indistinguishable corn fields, and you’ll get there. The other is like navigating a narrow mountain road in a thunderstorm: you’re 100% engaged and you might still tumble or get hit by lightning.
There’s still a limit on how far one can drive in a day, no matter the road.
The number of bugs tends to be linear to lines of code written meaning fewer lines of code for the same functionality will have fewer bugs.
So I’m pretty skeptical that reviewing 2000 lines of code won’t take any more time than reviewing 200 lines of code.
Furthermore how do you know the AI generated lines are the open highway lines of code and not the mountain road ones? There might be hallucinations that pattern match as perfectly reasonable with a hard to spot flaw.
Its still posssible to run any LLM in a loop and optimize for LoC while preserving the wanted outcome.
I experimented with vibe coding (not looking at the code myself) and it produced around 10k LOC even after refactors etc.
I rewrote the same program using my own brain and just using ChatGPT as google and autocomplete (my normal workflow), I produced the same thing in 1500 LOC.
The effort difference was not that significant either tbh although my hand coded approach probably benefited from designing the vibe coded one so I had already though of what I wanted to build.
Sounds like a great oppurtunity to understand your own development process, and codify it in such detail that the agent can replicate how you work and end up with less code but doing the same.
My experience was the same as you when I started using agents for development about a year ago. Every time I noticed it did something less-than-optimal or just "not up to my standards", I'd hash out exactly what those things meant for me, added it to my reusable AGENTS.md and the code the agent outputs today is fairly close to what I "naturally" write.
or go with this, and use the agent to prototype ideas, and write it yourself once you know what you want
LoC is perfectly fine as a metric for engineering output. It is terrible as a standalone measure of engineering productivity, and the problems occur when one tries to use it as such.
It's still useful, however, because that is the only metric that is instantly intuitively understandable and comparable across a wide variety of contexts, i.e. across companies and teams and languages and applications.
As we know, within the same team working on the same product, a 1000 LoC diff could take less time than a 1 line bug fix that took days to debug. Hence we really cannot compare PRs or product features or story points across contexts. If the industry could come up with a standard measure of developer productivity, you'd bet everyone would use it, but it's unfeasible basically for this very reason.
So, when such comparisons are made (and in this case it was clearly a colloquial usage), it helps to assume the context remains the same. Like, a team A working on product P at company C using tech stack T with specific software quality processes Q produced N1 lines of code yesterday, but today with AI they're producing N2 lines of code. Over time the delta between N1 and N2 approximates the actual impact.
(As an aside, this is also what most of the rigorous studies in AI-assisted developer productivity have done: measure PRs across the same cohorts over time with and without AI, like an A/B test.)
He's not using LOC as a metric, he's making an observation about the impact of a change in the typical volume of LOC.
Is it? The whole point of the article is that the rate of output for writing code has surpassed the rate at which it can be reviewed by humans. LOC as an input for software review makes a lot of sense, since you literally need to read each line.
LOC is the worst metric for engineering output, except for all the others - Churchill
The amount of times an engineer says what the fuck while reading code still seems like a reliable metric for code quality assessment.
We won’t be doing that for much longer, enjoy it while you can.
Somewhat reliable, yes. Not objective, though, and hard to reproduce.
In a world where everything is vibes now that doesn’t matter much.
Agreed. And, LOC has historically been one of the things we've collectively fought against management for how to evalute a "productive" developer!
Why?
We should have gone the other way; generated a lot of code and demanded pay raises; look at the LOC I cranked out! Company is now in my debt!
If they weren't going to care enough as managers to learn and line go up is all that matters to them, make all lines go up = winning
You all think there's more to this than performative barter for coin to spend on food/shelter.
Because not everyone is just out after earning the most money, some people also want to enjoy the workplace where they work. Personally, what the quality of the codebase and infrastructure is in matters a lot for how much you enjoy working in it, and I'd much rather work in a codebase I enjoy and earn half, than a codebase made by just jerking out as many LOC as possible and earn double.
Although this requires you to take pride in your profession and what you do.
All of human agency must prop up the vanity of you. Of all people.
Got it.
...ok fine; lack of political action to put us all on the hook for your healthcare is your choice to take a gamble on a paycheck. It's a choice to say your own existence is not owed the assurance of healthcare.
So I will honor your choice and not care you exist.
Humans are also incredibly varied and different.
Do you reject all stats that treat the number of people involved (eg. 2 million pepole protested X) as "embarrassing" ... because they lump incredibly varied people together and pretend they're equal?
I read somewhere that measuring software engineering output by LoC is like measuring aerospace engineering by pounds added to the plane and I thought that was an apt comparison.
Honestly it’s more like 200 to a 100,000 of pretty decent quality code at this point.
At least "mentions of LOC" is now a great metric for "how clueless is this person"
Totally. I thought Simon was wiser than this; even he couldn't resist getting swept up by breathless hype. The moment you start typing "LOC as a metric", alarm bells should go off in your head.
This was a podcast, not a pre-scripted talk. I suggest listening to the audio version - it makes it more clear that this was thinking out loud, not carefully considering every word.
I see, fair point. Sorry for taking a dig at you. Please know that I do appreciate a lot of work that you do. I was just worried for a moment when just reading that bit.
LOC is very much an effective metric for general productivity for the median feature. You can't code golf most lines of code out of existence.
We're also assuming LOC vibe coded by competent engineers who should be able to tell when something is overengineered.
Have you noticed that the coding agents get really close to the solution on the first one shot and then require tons of work to get that last 10% or 5%?
If we shift the paradigm of how we approach a coding problem, the coding agents can close that gap. Ten years ago every 10 or 15 minutes I would stop coding and start refactoring, testing, and analyzing making sure everything is perfect before proceeding because a bug will corrupt any downstream code. The coding agents don't and can't do this. They keep that bug or malformed architecture as they continue.
The instinct is to get the coding agents to stop at these points. However, that is impossible for several reasons. Instead, because it is very cheap, we should find the first place the agent made a mistake and update the prompt. Instead of fixing it, delete all the code (because it is very cheap), and run from the top. Continue this iteration process until the prompt yields the perfect code.
Ah, but you say, that is a lot of work done by a human! That is the whole point. The humans are still needed. The process using the tool like this yields 10x speed at writing code.
This was often true when writing code manually to be fair.
You could get to "something that works" rather fast but it took a long time to 1) evaluate other options (maybe before, maybe after), 2) refine it, 3) test it and build confidence around it.
I think your point stands but no one really knows where. The next year or so is going to be everyone trying to figure that out (this is also why we hear a lot of "we need to reinvent github")
When I hire fresh out of college… I can see them coming in and not having the slightest comprehension of the difference of the things that they did in school to get a grade and never touch it again versus a product that is supposed to exist and work for 10+ years.
The problem of life in general is the last 5-10% is always the hardest. And it makes no economic sense in many cases to invest in trying to make that last part mechanised.
I believe the llm providers went with the wrong approach from the off - the focus should’ve been on complementing labour not displacement. And I believe they have learned an expensive lesson along the way.
I tend to get something working and refactor my way out, which does work and you can use a coding agent to do it, but it takes time. Maybe starting over would have been better, but I didn’t know what I wanted the architecture to look like at the beginning.
That will not work as cleanly as you described once a lot of code has been committed to the code base. You cannot just blow away an entire working code base and start over just because an LLM is struggling to make a feature work with existing architecture.
Strong agree. Most orgs will stay tangled in the mess they hand-coded over the years, a few greenfield teams will pull ahead, but until some LLM-fuelled startup displaces a strong incumbent I'm skeptical that we're on the cusp of anything other than a K-shaped transition. I see already low quality software and orgs getting flushed to make room for some new ideas now that the barrier to entry is slightly lower (but far from free). I just wish the transition was done with more humanity.
I think all coding will become vibe coding, but it will be no less an engineering discipline.
Note: I still review pretty much every line of code that I own, regardless of who generates it, and I see the problems with agents very clearly... but I can also see the trends.
My take: Instead of crafting code, engineering will shift to crafting bespoke, comprehensive validation mechanisms for the results of the agents' work such that it is technically (maybe even mathematically) provable as far as possible, and any non-provable validations can be reviewed quickly by a human. I would also bet the review mechanisms would be primarily visually, because that is the highest bandwidth input available to us.
By comprehensive validations I don't mean just tests, but multiple overlapping, interlocking levels of tests and metrics. Like, I don't just have an E2E test for the UI, I have an overlapping test for expected changes in the backend DB. And in some cases I generate so many test cases that I don't check for individual rows, I look at the distribution of data before and after the test. I have very few unit tests, but I do have performance tests! I color-code some validation results so that if something breaks I instantly know what it may be.
All of this is overkill to do manually but is a breeze with agents, and over time really enables moving fast without breaking things. I also notice I have to add very few new validations for new code changes these days, so once the upfront cost is paid, the dividends roll in for a long time.
Now, I had to think deeply about the most effective set of technical constraints that give me the most confidence while accounting for the foibles of the LLMs. And all of this is specific to my projects, not much can be generalized other than high-level principles like "multiple interlocking tests." Each project will need its own custom validation (note: not just "test") suites which are very specific to its architecture and technical details.
So this is still engineering, but it will be vibe coding in the sense that we almost never look at the code, we just look at the results.
I agree somewhat, but I do still think there is a decently sized separation between true vibe coding (the typical "make me an app...fix this bug") and actual AI assisted development. I personally think that if you are a dev and you simply trust the AI's output, that is still vibe coding.
I am not a developer and have very basic code knowledge. I recently built a small and lightweight Docker container using Codex 5.5/5.4 that ingests logs with rsyslog and has a nice web UI and an organized log storage structure. I did not write any code manually.
Even without writing code, I still had to use common sense in order to get it in a place I was happy with. If i truly knew nothing, the AI would have made some very poor decisions. Examples: it would have kept everything in main.go, it would have hardcoded the timezone, the settings were all hardcoded in the Go code, the crash handling was non existent, and a missing config would have prevented start. And that is on a ~3000 line app. I cannot imagine unleashing an AI on a large, complex. codebase without some decent knowledge and reviewing.
The real paradigm shift is not here yet, but not very far away. I'm talking about the single unified codebase. Agents building a unique codebase for all your software needs.
Because most of the complexity in software comes from interfacing with external components, when you don't need to adapt to this you can write simpler and better code.
Rather than relying on an external library, you just write your own and have full control and can do quality control.
Linux kernel is 30 000 000 LOC. At 100 tokens /s, let's say 1 LOC per second produced for a single 4090 GPU, in one year of continuous running 3600 * 24 * 365 = 31 536 000 everyone can have its own OS.
It's the "Apps" story all over again : there are millions of apps, but the average user only have 100 max and use 10 daily at most.
Standardize data and services and you don't need that much software.
What will most likely happen is one company with a few millions GPUs will rewrite a complete software ecosystem, and people will just use this and stop doing any software because anything can be produced on the fly. Then all compute can be spent on consistent quality.
Every happy OS will be the same. Every broken OS will be broken in its own way. What a nightmare.
The scary part is that codebases are getting layers of AI complexity, that it's going to cost $$$ to have the latest model decipher and make changes as no human can understand the code anymore.
Pretty soon there is no code reuse and we're burning money reinventing the wheel over and over.
> The scary part is that codebases are getting layers of AI complexity, that it's going to cost $$$ to have the latest model decipher
Isn't this a bit like old Java or IDE-heavy languages like old Java/C#? If you tried to make Android apps back in the early days, you HAD to use an IDE, writing the ridicolous amount of boilerplate you had to write to display a "Hello Word" alert after clicking a button was soul destroying.
At least a human can get involved. Complex codebases written by humans can be understood.
If the barrier is too high, code is refactored.
This is a timely observation and feels right to me. I needed to get a relatively simple batch download -> transform -> api endpoint stood up. I wrote a fairly detailed prompt but left a lot of implementation details out, including data sources.
Opus 4.7 built it about 90% the same way I would, but had way more convenience methods and step-validations included.
It's great, and really frees me up to think about harder problems.
This is my experience too. I'm primarily a python dev, but have been routinely using other backend languages (rust, go, etc) that I'm familiar with but not at the same level.
Just having ~13yrs experience heavily weighted in one language with some formal studying of others makes directing llms a lot simpler.
Learning syntax, primitives, package managers, testing, etc isn't that much of a lift compared to how I used to program.
Was helping a non-dev colleague who's using claude cowork/code to automate reporting the other day. They understand the business intelligence side well, but were struggling with basic diction to vibe code a pyautogui wrapper to pull up RDP and fill out a MS Access abstraction on a vendor DB.
Think we'll be fine for another 5-10 years as a profession
It's already the case that you get much better results out of LLMs by forcing agents using them to go through additional layers of planning, design & review.
The future is going to dynamically budget and route different parts of the SLDC through different models and subagents running on the cloud. Over time, more and more of that process will be owned by robots and a level of economic thinking will be incorporated into what is thought of today as "software engineering." At some point vibe coding _is_ coding and we're maybe closer to that point than popularly believed.
Claude often does things in more detail, and even better, than I would, in the first pass. But I don't understand how anybody stands comments generated by an LLM?
It's seriously the thing that worries (and bothers) me the most. I almost never let unedited LLM comments pass. At a minimum.
Most of the time, I use my own vibe-coded tool to run multiple GitHub-PR-review-style reviews, and send them off to the agent to make the code look and work fine.
It also struggles with doing things the idiomatic way for huge codebases, or sometimes it's just plain wrong about why something works, even if it gets it right.
And I say this despite the fact that I don't really write much code by hand anymore, only the important ones (if even!) or the interesting ones.
Also, don't even get me started on AI-generated READMEs... I use Claude to refine my Markdown or automatically handle dark/light-mode, but I try to write everything myself, because I can't stand what it generates.
I find that the best thing about generating documentation with LLM's is that it gets me angry enough to rewrite it correctly.
"Ugh, no! Why would you say it like that? That's not even how it works! Now, I need to write a full paragraph instead of a short snippet to make sure that no future agents get confused in the same way."
The comments aren't an LLM thing, they're a Claude thing. Codex doesn't write those gross hyper-verbose comments.
One-shot "vibe coding" is generally a mistake.
But using an agentic LLM to complete boilerplate is attractive simply because we've created a mountain of accidental and intentional complexity in building software. It's more of a regression to the mean of going back to the cognitive load we had when we simply built desktop applications.
Tell it to make a plan. Ask it to do 3-5 steps at a time. “One shotting” works very well.
There are techniques for improving our confidence in our software: unit testing, integration testing, fuzz testing, property-based testing, static analysis, model checking, theorem proving, formal methods, etc. The LLM is not only a tool for generating lines of code. It can also generate lines of testing. The goal is that the tests are easier to audit by the humans than the code.
I've found that one of the areas I enjoyed least is now what I spend a lot of time on now: testing!
Property-based testing in particular has uncovered a number of invariants in every code base I've introduced it to.
tbf depending on the agent/model a lot of the tests end up being thrown out so it's possible I _should_ handwrite more tests, but having better prompts and detailed plans seems to mitigate that somewhat
Given rapidly decelerating quality of, at least, claude code output, the agentic coding use may decrease. It is insane how bad the results of background agents are now: constant hallucinations, nonsensical outputs.
The heavy users of Claude at my job disagree (me included), our work gets shipped and the quality has increased by all metrics. Are you talking about enterprise or consumer Claude subscriptions? I think they're serving drastic different quality depending on how much $ you fork up.
When I was in grad school I graded homework for first year math classes, and the thing about math homework is that the perfect homework takes almost no time to grade.
It's the bad, semi-coherent submissions that eat up your time, because you do want to award some points and tell students where they went wrong. It's the Anna Karenina principle applied to math.
Code review is the same thing. If you're sure Claude wrote your endpoint right, why not review it anyway? It's going to take you two minutes, and you're not going to wonder whether this time it missed a nuance.
Typically in engineering you don't know what you're doing. If you're sure of what it should look like going in, you're more of a technician. I think most people coding have no idea what they're doing to a large extent- not many people can do the same rote work for years straight.
> The thing that really helps me is thinking back to when I’ve worked at larger organizations where I’ve been an engineering manager. Other teams are building software that my team depends on.
> If another team hands over something and says, “hey, this is the image resize service, here’s how to use it to resize your images”... I’m not going to go and read every line of code that they wrote.
The distance of accountability of the output from its producer is an important metric. Who will be held accountable for which output: that's important to maintain and not feel the "guilt".
So, organizations would need to focus on better and more granular building incentives and punishment mechanisms for large-scale software projects.
I think I'm just too opinionated to go there. If I see something that works fine, but isn't the way I'd do it, it doesn't matter if a human or an LLM wrote it I'm still in there making it match my vision.
That's not how most organizations work, AI or not.
What do you mean?
Organizations usually are not looking for employees who change things that work fine, just because it disagrees with the "vision" of one employee.
100%. I don't think any senior programmer ever looks at another developer's code and says, "Oh yeah, that's just the way I'd do it."
But I assume you don't go and change all your co-workers code just because they didn't do it how you would have done it?
Even the most toxic places I've worked that kind of behavior would totally get you canned.
I concur, and I think that is one of the most difficult aspects of reviewing another's code. It's difficult for me to sometimes differentiate between what is acceptable vs. what I would have done. I have to be very conscious to not impose my ideals.
So you are going to waste everyone's time getting another developer to write code the way you want? This resonates with me because at my company I get this all the time. At that point, you might as well close my PR and do it yourself, whatever way you want. I really like the advice from the book 0 2 1, to assign different areas of responsibility to people, so that there is no conflict.
> So you are going to waste everyone's time getting another developer to write code the way you want?
No one is suggesting that.
The current state of the technology is that you must read at least some of the code, but everyone keeps shipping tools that are focussed on churning out more and more stuff without giving you any affordances to really understand the output.
Claude Code in particular seems really uninterested in this aspect of the problem and I've stopped using entirely because of this.
the discourse around "code quality" has always attracted the least nuanced minds, ones who see the world and the phenomenon of life as nothing but territory to be divided up by the latest buzzwords. the worst ones insist that we narrow the discussion even further, to focus on the conflicts between these buzzwords. whenever i have to sit through such discussions, i try to meditate on the irony of mother nature weaving the most functionally brutal, ruthlessly redundant poetry that is the genetic code, only for the resulting creatures to deny themselves the power of the principles inherent in their own construction.
I think this is what people mean when they say LLMs are a higher level abstraction. We still need to consider edge cases and have tests. We still to sweat the architecture and understand how the pieces fit together and have a mental map of the codebase. But within each bottom node of that architecture we don't sweat the details. Anything obvious gets caught right away. Most subtle/interaction-based issues occur at the architecture level. Anything that bypasses those filters is a weird bug that is no worse or different from a normal bug fixes - an edge case that was hit in a real world scenario that gets flagged by a user or a logged as an error.
There are certain codebases and pieces of code we definitely want every line to be reasoned and understood. But like his API endpoint example, no reason to fuss with the boilerplate.
This has definitely been my shift over the past few months, and the advantage is I can spend much more time and energy on getting the code architecture just right, which automatically prevents most of the subtle bugs that has people wringing their hands. The new bar is architecting code to be defined as well as an API endpoint->service structure so you can rely on LLMs to paint by numbers for new features/logic.
Good description of my thoughts on vibe coding / agentic engineering.
Spend a lot more time on architecting and testing than hand rolling most repos now.
Hats off to people who enjoy the minutia of programming everything by hand, but turns out I enjoy the other aspects of software development more.
As agents get better at code we trust them to produce more of it. There are still bugs to find, but the haystack gets bigger.
So the number of bugs to find remains constant but the amount of code to review scales with the capability of the agent.
Vibe coding is just coding now. Writing assembly used to be a thing too until higher and higher languages were created. LLM is like that except it compiles English to code. This scares lot of professionals understandably.
I am experimenting with writing en entire TypeScript compiler[1] with AI assistant. I've spent 4 months on it already. It might not be successful at the end of the day but my thinking is that if LLMs are going to write a lot of the code I better learn how this can and can not work. I've learned a lot from this project already. I think we're still in charge of design and big ideas even if all of the code is written by AI
[1] https://github.com/mohsen1/tsz
I'm also experimenting with it more and more. Now I'm trying to create a 2D side-scrolling shooter with it, running in the browser. When it was relatively small, it did a good job. As the codebase and docs/ files that I'm using get larger it starts hallucinating, especially when the context gets at about 50% usage (Codex w/ gpt5.5). As in, it'll literally forget to update parts of the code.
e.g, I change velocity of player to '200' and of bullets to '300', and it only updated the bullet velocity. Then told me the player was already 'at the correct value' even though it was set to 150. Things like that.. :)
For me, unless there is a concrete way of proving work is correct you can't rely on AI coding. tsz has super strict tests around correctness, performance and architectural boundaries
If I understood you correctly, I think I'm less extreme than that. Most code written by humans is also not provably correct. But I'm assuming you mean provably correct like Lean: https://lean-lang.org/, and not just "passes tests".
If you mean 'passes tests', that can be tackled by AI. Although AI writing its own tests and then implementing its own code is definitely not a foolproof strategy.
>25k commits in 4 months or about 1 commit every 7 minutes
How do you manage/orchestrate this? I'm genuinely curious.
Multiple computers and each multiple Claude Code or Codex sessions. It had lots of ups and downs. Now I have a good enough test harness that makes it easier to iterate faster
Do you not run out of things to code?
Code is not the goal. What code does is.
The problem with vibe coding closer is that the agentic makes a very plasticy samey feel unless you work with something that makes it unique or can pass a template through it.
Correct me if I’m wrong Simon, but weren’t you highly optimistic about llm’s and agentic-use of them?
I believe this is a common fault of not being able to zoom out and look at what trade offs are being made. There’s always trade-offs, the question is whether you can define them and then do the analysis to determine whether the result leaves you in a net benefit state.
I still am. I think setting up LLMs to call tools in a loop is a fascinating way to build interesting software that could not have existed before.
Coding agents are also upending how software development works, in a way that we are still very much figuring out.
I don't think anyone has a confident answer for how best to apply them yet, especially on larger production-ready projects.
I think you kind of answered this in the post though. "I want somebody to have used the thing" is dogfooding. and it's probably the only quality signal left that can't be generated in 30 minutes.
Agentic engineering? That reads to me a little like amateur oncologist. How are you defining engineering?
Can agentic engineers adhere to a similar code of ethics that a professional engineer is sworn to uphold?
https://www.nspe.org/career-growth/nspe-code-ethics-engineer...
The problem of calling what most of us do "engineering" predates LLMs by a good 15-20 years.
> Can agentic engineers adhere to a similar code of ethics that a professional engineer is sworn to uphold?
Can software engineers?
What the F is "agentic" really?
> It used to be if you found a GitHub repository with a hundred commits and a good readme and automated tests and stuff, you could be pretty sure that the person writing that had put a lot of care and attention into that project.
I think this highlights a problem that has always existed under the surface, but it's being brought into the light by proliferation of vibeslop and openclaw and their ilk. Even in the beforetimes you could craft a 100.0% pure, correct looking github repo that had never stood the test of production. Even if you had a test suite that covers every branch and every instruction, without putting the code in production you aren't going to uncover all the things your test suite didn't--performance issues, security issues, unexpected user behavior, etc.
As an observer looking at this repo, I have no way to tell. It's got hundreds of tests, hundreds of commits, dozens of stars... how am I to know nobody has ever actually used it for anything?
I don't know how to solve this problem, but it seems like there's a pretty obvious tooling gap here. A very similar problem is something like "contributor reputation", i.e. the plague of drive-by AI generated PRs from people (or openclaws) you've never seen before. Stars and number of commits aren't good enough, we need more.
> I know full well that if you ask Claude Code to build a JSON API endpoint that runs a SQL query and outputs the results as JSON, it’s just going to do it right. It’s not going to mess that up. You have it add automated tests, you have it add documentation, you know it’s going to be good.
> But I’m not reviewing that code. And now I’ve got that feeling of guilt: if I haven’t reviewed the code, is it really responsible for me to use this in production?
Answer: it wholly depends upon what management has dictated be the goal for GenAI use at the time.
There seems to be a trend of people outside of engineering organizations thinking that the "iron triangle" of software (and really, all) engineering no longer holds. Fast, cheap, good: now we can pick all three, and there's no limit to the first one in particular. They don't see why you can't crank out 10x productivity. They've been financially incentivized to think that way, and really, they can't lose if they look at it from an "engineer headcount" standpoint. The outcomes are:
1) The GenAI-augmented engineer cranks out 10x productivity without any quality consequences down the line, and keeps them from having to pay other people
or
2) The GenAI-augmented engineer cranks out 10x productivity with quality consequences down the line, at which point the engineer has given another exhibit in the case as to why they should no longer be employed at that organization. Let the lawyers and market inertia deal with the big issues that exist beyond the 90-day fiscal reporting period.
Either way, they have a route to the destination of not paying engineers, and that's the end goal.
If you don't like that way of running a software engineering organization, well, you're not alone, but if nothing else, you could use GenAI to make working for yourself less risky.
I feel like an outlier in all of this. But isn't this just more AI slop? How is this different from text generation or image generation?
Like many people I have used AI to generate crap I really don't care about. I need an image. Generate something like, whatever. Great hey a good looking image! No that's done I can do something I find more interesting to do.
But it's slop. The image does not fit the context. Its just off. And you can tell that no one really cared.
This isn't good.
The difference is that coding agents can run the code that they produce, fix any bugs, build tests and generally demonstrate that it works.
You can't do that for images and text.
Every time I do deep work, and think of solutions to a complex problem. I always have the opportunity to ask claude to implement a sub-par AI slop solution.
Do this enough times, and I will have forgotten how to think.
Or, you just explain the solution and save some typing and get the same thing. I find it refreshing to be able to just talk to Claude and have it generate the same thing I would have built.. It gives me more time to articulate and solve complex problems, and less time with the mundane writing, test loops etc.
huh. i honestly never thought they were all that different. didn't the same guy coin them both to refer to the same thing?
Not at all. Andrej Karpathy coined vibe coding as: https://twitter.com/karpathy/status/1886192184808149383
> where you fully give in to the vibes, embrace exponentials, and forget that the code even exists [...] It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
So clearly we need a term for what happens when experienced, professional software engineers use LLM tooling as part of a responsible development process, taking full advantage of their existing expertise and with a goal to produce good, reliable software.
"Agentic engineering" is a good candidate for that.
> as part of a responsible development process, taking full advantage of their existing expertise and with a goal to produce good, reliable software
Its shifted so much for me. I used to think that I had a solemn duty to read every line and understand it, or to write all the test cases. Then I started noticing that tools like CodeRabbit, or Cursor would find things in my code that I would rarely find myself.
I think right now, its shifted my perception of my role to one where I am responsible for "tilting" the agentic coding loop; ultimately the goal is a matter of ensuring the agent learns from its mistakes, self-organize and embrace a spirit of Kaizen.
Btw thank you for your work on Django, last 20 years with it were life changing (I did .NET before).
For work I do agentic engineering. As the code that I submit for a code review is hand reviewed by me. I know every line and file that I submit.
My side project is 80% vibe code. Every now and then I look and see all the bad stuff, then I scold Codex a bit and it refactors it for me. So I do see the author's point.
Instead of "vibe coding" by asking the AI to design and write code, I'm having it refine my own designs, and write code under strict supervision and guidance, that I carefully review and iterate on.
I took a rock carving course in school that really enlightened me about software engineering, and it still applies today, especially to AI. You can't just decide what you want to carve, hold the chisel in just the right spot, and whack it with a hammer just perfectly so all the rock you want falls away leaving a perfect statue behind.
"I saw the angel in the marble and carved until I set him free." -Michelangelo
It's a long drawn out iterative process of making millions of tiny little chips, and letting the statue inside find its way out, in its natural form, instead of trying to impose a pre-determined form onto it.
Vibe coding is hoping your first whack of the hammer is going to make a good statue, then not even looking at the statue before shipping it!
But AI assisted conscientious coding (or agentic engineering as Simon calls it) is the opposite of that, where you chip away quickly and relentlessly, but you still have to carefully control where you chisel and what you carve away, and have an idea in your mind what you want before you start.
Simon,
Just piggy backing on this post since I'm early:
Would love to see your take on how the AI and Django worlds will collide.
I agree, I'm actually generating just over of 20,000 lines of code each day at my company. Part of that was the mandate and leaderboards around token usage, but also they started using pull requests as an explicit metric. What I do is usually pull around 5 or so tickets at once, spin up 5 different agents on their own branch, have them work until completion, and then spin up two more agents to handle the merge request.
I'm not checking the code since the code doesn't really matter anymore anyways - I just have the agent write passing tests for the changes or additions I make, and so even if something breaks I can just point to the tests.
Some days, the tickets are completed much faster than I expect and I don't hit my daily token expenditure goal, so I have my own custom harness that actually hooks up an agent to TikTok, basically it splits up the reel into 1 second increments and then feeds those frames to the LLM for it's own consumption. I can easily burn 10m tokens a day on this, and Claude seems to enjoy it.
Personally I want to thank you Simon for putting me onto this "vibe engineering" concept, I really didn't expect an archaeology major like myself to become a real engineer but thanks to AI now I can be! Truly gatekeeping in tech is now dead.
I'd be lying if I said I was not worried about the future. I am not necessarily worried in the sense that there will be some grave, impeding doom that awaits the future of humanity.
Rather, I just feel like I have to constantly remind myself of the impermanence of all things. Like snow, from water come to water gone.
Perhaps I put too much of my identity in being a programmer. Sure, LLMs cannot replace most us in their current state, but what about 5 years, 10 years, ..., 50 years from now? I just cannot help be feel a sense of nihilism and existential dread.
Some might argue that we will always be needed, but I am not certain I want to be needed in such a way. Of course, no one is taking hand-coding away from me. I can hand-code all I want on my own time, but occupationally that may be difficult in the future. I have rambled enough, but all and all, I do not think I want to participate in this society anymore, but I do not know how to escape it either.
If you work in any new technology field, the chances that your job will exist in the same way 50 years from now is very small.
The job, as you have done it at least, was also not here 50 years before you started doing it.
Did you have any of the same feelings knowing that you were doing a job that has not existed in the world very long? That seems like a strange requirement for a meaningful job, that it should remain the same for 50+ years.
In truth, our world and what we do for our careers is entirely shaped by the time that we live in. Even people that ostensibly do the same thing people have done for centuries (farmer, teacher, etc) are very different today than 100 years ago.
> And that feels about right to me. I can plumb my house if I watch enough YouTube videos on plumbing. I would rather hire a plumber.
I don't buy this argument at all. I think if we could pay $20/month to a service that would send over a junior plumber/carpenter/electrician with an encyclopedic knowledge of the craft, did the right thing the majority of the time, and we could observe and direct them, we'd all sign up for that in a heartbeat. Worst case, you have to hire an experienced, expensive person to fix the mess. Yes, I can hear everyone now, "worst case is they burn your house down." Sure, but as we're reminded _constantly_ when we read stories about AI agent catastrophes -- a human could wipe your prod database too. wHy ArE yOu HoLdInG iT tO a DiFfErEnT sTaNdArD???
The business side of the house is getting to live that scenario out right now as far as software goes. Sure you've got years of expertise that an LLM doesn't have _yet_. What makes you think it can't replace that part of your job as well?
> I think if we could pay $20/month to a service that would send over a junior plumber/carpenter/electrician with an encyclopedic knowledge of the craft, did the right thing the majority of the time, and we could observe and direct them, we'd all sign up for that in a heartbeat.
I don’t think this comparison quite works (or maybe I think it works and is wrong) and I think it has something to do with creativity or the initial ideation.
I would do this, but I’m a jack of all trades. I built my own diner booth in my kitchen recently. But my wife, who loves the diner booth, just doesn’t really want to get over the hump of figuring out what she might want. I think most people want to offload the mental load of figuring out where to start.
Most people aren’t just bored by coding, they’re bored or overwhelmed by the idea of thinking about software in the first place. Same with plumbing or construction, most people aren’t hiring someone to direct, they’re hiring a director.
Even I have this about some things, sometimes I choose to outsource the full stack of something to give me more space to do creativity elsewhere.
You're comparing paying $20 for an AI plumber to paying hundreds/thousands for a traditional plumber.
But that's not what the author is talking about in that passage you quoted. What he's saying is that, if you can pay $20 for an AI plumber, then it stands to reason that eventually you will be able to pay $30 to a company that manages AI plumbers for you, so that you don't even have to go to the trouble of supervising the plumber. Most people will choose the $30.
It's in a section called "Why I’m still not afraid for my career."
The implication here is software engineer jobs are still safe despite basically free labor/material being available to do said jobs because he thinks other people would prefer to pay experienced professionals to do it right at a significantly higher cost. My point is, I think most people will take the low-stakes gamble of having the cheap AI agent do it with self-supervision[0]. He's naive in thinking people are really going to care about artisanal software built by experienced professionals in the future.
0: Even if you subscribe to the "your job will be to supervise the agents" train of thought, you're kinda glossing over the fact that it's probably gonna involve a pretty significant pay cut and the looming problem of "how do new experienced professionals get created if they don't have to/don't need to get their hands dirty"?
I literally do pay $20 a month to have a plumber service on call.
And that includes materials, labor, and will be there the instant you need them?
Not instant, but same day yes.
... And labor and materials?
Labor is included, i have to pay for any additional materials required
It is pure arrogance to expect that machines will never be able to code as good as a skilled human.
And AI generated code should be different than human code. AI has infinite memory for details. AI doesn’t need organizational patterns like classes. Potentially AI can write code that is more performant than any human.
Will it look like garbage? Sure. Will the code be more suited to the task? Yes.
I find it hard to believe that code with unnecessary cruft and repetition is "more suited to the task". I've literally deleted hundreds of unnecessary or unused functions at this point. The only way I can agree is if "more suited" means, "it's wearing multiple suits for no reason".
What will happen when AI companies increase the price of tokens?
The code produced will only be understandable by AI. You could use locally hosted LLMs, but it won't be as performant as AI run by big guys. And there is nothing stopping greedy companies implementing some ridiculous pattern that only their model can reasonably work with.
So what you'll do in situation when you can't understand "your" codebase and you have to make changes or fix a bug?
What happens when the price of tokens goes to 0?
The open weight models are nipping on the heels of frontier models. The frontier labs have to make forward progress and keep tokens cheap in order to maintain marketshare.
Eventually, we'll have a Mythos-level model running on integrated hardware on every PC.
I would only add one caveat to this:
Code that is organized well and operates coherently in the first place, by an LLM or not, will be easier to iterate on, by an LLM or not.
Your post weeks of pure arrogance. You sound like the bozo’s at Anthropic who made an AI agent for finance and think this is somehow going to provide a huge productivity boost because all they do is a bunch of tick boxing and spreadsheet work.
No, just no.