I don't remember any advanced computer user, including developers saying that the CLI is dead.
The CLI has been dead for end-users since computers became powerful enough for GUIs, but the CLI has always been there behind the scenes. The closest we have been to the "CLI is dead" mentality was maybe in the late 90s, with pre-OSX MacOS and Windows, but then OSX gave us a proper Unix shell, Windows gave us PowerShell, and Linux and its shell came to dominate the server market.
> I don't remember any advanced computer user, including developers saying that the CLI is dead.
Obviously not around during the 90's when the GUI was blowing up thanks to Windows displacing costly commercial Unix machines (Sun, SGI, HP, etc.) By 2000 people were saying Unix was dead and the GUI was the superior interface to a computer. Visual Basic was magic to a lot of people and so many programs were GUI things even if they didn't need to be. Then the web happened and the tables turned.
That is a bit of a simplification, many users found value in wysiwyg, there was an aborted low-code visual programming movement.
Microsoft drank early OOP koolaid and thus powershell suffered from problems that were well covered by the time etc…
Ray Norda being pushed out after WordPerfect bought Novell with their own money and leveraged local religious politics in addition to typical corporate politics killed it.
Intel convinced major UNIX companies to drop their CPUs for IA-64 which was never delivered, mainly because the core decision was incompatible with the fundamental limitations of computation etc…
The rise of Linux, VMs and ultimately the cloud all depended on the CLI.
Add in Microsoft anticompetitive behavior plus everything else and you ended up with a dominant GUI os provider with a CLI that most developers found challenging to use.
I worked at some of the larger companies with large windows server installations and everyone of them installed Cygwin to gain access to tools that allowed for maintainable configuration management tools.
There are situations like WordPerfect which had GUI offerings be delayed due to the same problem that still plague big projects today, but by the time the web appeared Microsoft had used both brilliant and extremely unethical practices to gain market dominance.
The rise of technology that helped with graphics like vesa local bus and GPUs in the PC space that finally killed the remaining workstation vendors was actually concurrent with the rise of the web.
Even with that major companies like SGI mainly failed because they dedicated so many resources to low end offerings that they lost their competitiveness on the high end, especially as they fell into Intels trap with Itanium too.
But even that is complicated way beyond what I mentioned above.
I think it might loop back around pretty quick. I've been using it to write custom GUI interfaces to streamline how I use the computer, I'm working piecemeal towards and entire desktop environment custom made to my own quirky preferences. In the past a big part of the reason I used the terminal so often for basic things was general frustration and discomfort using the mainstream GUI tools, but that's rapidly changing for me.
I do really like the Unix approach Claude Code takes, because it makes it really easy to create other Unix-like tools and have Claude use them with basically no integration overhead. Just give it the man page for your tool and it'll use it adeptly with no MCP or custom tool definition nonsense. I built a tool that lets Claude use the browser and Claude never has an issue using it.
The light switch moment for me is when I realized I can tell claude to use linters instead of telling it to look for problems itself. The later generally works but having it call tools is way more efficient. I didn't even tell it what linters to use, I asked it for suggestions and it gave me about a dozen of suggestions, I installed them and it started using them without further instruction.
I had tried coding with ChatGPT a year or so ago and the effort needed to get anything useful out of it greatly exceeded any benifit, so I went into CC with low expectations, but have been blown away.
As an extension of this idea: for some tasks, rather than asking Claude Code to do a thing, you can often get better results from asking Claude Code to write and run a script to do the thing.
Example: read this log file and extract XYZ from it and show me a table of the results. Instead of having the agent read in the whole log file into the context and try to process it with raw LLM attention, you can get it to read in a sample and then write a script to process the whole thing. This works particularly well when you want to do something with math, like compute a mean or a median. LLMs are bad at doing math on their own, and good at writing scripts to do math for them.
A lot of interesting techniques become possible when you have an agent that can write quick scripts or CLI tools for you, on the fly, and run them as well.
It's a bit annoying that you have to tell it to do it, though. Humans (or at least programmers) "build the tools to solve the problem" so intuitively and automatically when the problem starts to "feel hard", that it doesn't often occur to the average programmer that LLMs don't think like this.
When you tell an LLM to check the code for errors, the LLM could simply "realize" that the problem is complex enough to warrant building [or finding+configuring] an appropriate tool to solve the problem, and so start doing that... but instead, even for the hardest problems, the LLM will try to brute-force a solution just by "staring at the code really hard."
(To quote a certain cartoon squirrel, "that trick never works!" And to paraphrase the LLM's predictable response, "this time for sure!")
I have a Just task that runs linters (ruff and pyright, in my case), formatter, tests and pre-commit hooks, and have Claude run it every time it thinks it's done with a change. It's good enough that when the checks pass, it's usually complete.
This is the best way to approach it but if I had a dollar for each time Claude ran “—no-verify” on the git commits it was doing I’d have 10’s of dollars.
Doesn’t matter if you tell it multiple times in CLAUDE.md to not skip checks, it will eventually just skip them so it can commit. It’s infuriating.
I hope that as CC evolves there is a better way to tell/force the model to do things like that (linters, formatters, unit/e2e tests, etc).
The lightbulb moment for me was to have it make me a smoke test and to tell to run the test and fix issues (with the code it generated) until it passes. iterate over all features in the Todo.md (that I asked it to make). Claude code will go off and do stuff for I dunno, hours?, while I work on something else.
Hours? Not in my experience. It will do a handful of tasks then say “Great! I’ve finished a block of tasks” and stop. and honestly, you’re gonna want to check its work periodically. You can’t even trust it to run litters and unit test reliably. I’ve lost count of how many times it’s skipped pre-commit checks or committed code with failing tests because it just gives up.
I've done exactly this with MCP
{
"name": "unshare_exec",
"description": "Run a binary in isolated Linux namespaces using unshare",
"inputSchema": {
"type": "object",
"properties": {
"binary": {"type": "string"},
"args": {"type": "array", "items": {"type": "string"}}
},
"required": ["binary"],
"additionalProperties": false
}
}
It started as unshare and ended up being a bit of a yakshaving endeavor to make things work but i was able to get some surprisingly good results using gemma3 locally and giving it access to run arbitrary debian based utilities.
Unix tools let agents act & observe in many versatile ways. That lets them close their loops. Taking yourself out of the loop lets your agent work far more efficiently.
But anything you can do on the CLI, so can an agent. It’s the same thing as chefs preferring to work with sharp knives.
All GUI apps are different, each being unhappy in its own way. Moated fiefdoms they are, scattered within the boundaries of their operating system. CLI is a common ground, an integration plaza where the peers meet, streams flow and signals are exchanged. No commitment needs to be made to enter this information bazaar. The closest analog in the GUI world is Smalltalk, but again - you need to pledge your allegiance before entering one.
A few days ago I read an article from humnanlayer. They mentioned shipping a weeks worth of collaborative work in less than a day. That was one data point on a project.
- Has anyone found claude code been able to documentation for parts of the code which does not:
(a). Explode in maintenance time exponentially to help claude understand and iterate without falling over/hallucinating/design poorly?
(b). Use it to make code reviewers life easy? If so how?
I think the key issue for me is the time the human takes to *verify*/*maintain* plans is not much less than what it might take them to come up with a plan that is detailed enough that many AI models could easily implement.
It is pretty tiresome with the hype tweets and not being able to judge the vibe code cruft and demoware factor.
Especially on bootstrap/setup, AIs are fantastic for cutting out massive amounts of time, which is a huge boon for our profession. But core logic? I think that's where the not-really-saving-time studies are coming from.
I'm surprised there aren't faux academic B-school productivity studies coming out to counter that (sponsored by AI funding of course) already, but then again I don't read B-school journals.
I actually wonder if the halflife decay of the critical mass of vibecode will almost perfectly coincide with the crash/vroosh of labor leaving the profession to clean it up. It might be a mini-y2k event, without such a dramatic single day.
Open-weights only are also not enough, we need control of the dataset and training pipeline.
The average user like me wouldn't be able to run pipelines without serious infrastructure, but it's very important to understand how the data is used and how the models are trained, so that we own the model and can assess its biases openly.
I view it as more or less irrelevant. LLMs are fundamentally black boxes. Whether you run the black box locally or use it remotely, whether you train it yourself or use a pretrained version, whether you have access to the training set or not, it's completely irrelevant to control. Using an LLM means giving up control and understanding of the process. Whether it's OpenAI or the training data-guided algorithm that controls the process, it's still not you.
Now, running local models instead of using them as a SaaS has a clear purpose: the price of your local model won't suddenly increase ten fold once you start depending on it, like the SaaS models might. Any level of control beyond that is illusory with LLMs.
I on the other hand think it's irrelevant if a technology is a blackbox or not. If it's supposed to fit the opensource/FOSS model of the original post having access to precursors is just as important as having access to the weights.
It's fine for models to have open-weights and closed data. It's only barely fitting the opensource model IMHO though.
The point of FOSS is control. You want to have access to the source, including build instructions and everything, in order to be able to meaningfully change the program, and understand what it actually does (or pay an expert to do this for you). You also want to make sure that the company that made this doesn't have a monopoly on fixing it for you, so that they can't ask you for exorbitant sums to address an issue you have.
An open weight model addresses the second part of THIS, but not the first. However, even an open weight model with all of the training data available doesn't fix the first problem. Even if you somehow got access to enough hardware to train your own GPT-5 based on the published data, you still couldn't meaningfully fix an issue you have with it, not even if you hired Ilya Sutskever and Yann LeCun to do it for you: these are black boxes that no one can actually understand at the level of a program or device.
It is an interesting question. Of course everyone should have equal access to the data in theory, but I also believe nobody should be forced to offer it for free to others and I don't think I want to spend tax money having the government host and distribute that data.
I'm not sure how everyone can have access to the data without necessitating another taking on the burden of providing it.
I think torrent is a very good way to redistribute this type of data. You can even selectively sync and redistribute.
I'm also not saying anyone should be forced to disclose training data. I'm only staying that a LLM that's only openweight and not open data/pipeline barely fits the opensource model of the stack mentioned by OP.
LLMs are making open source programs both more viable and more valuable.
I have many programs I use that I wish were a little different, but even if they were open source, it would take a while to acquaint myself with the source code organization to make these changes. LLMs, on the other hand, are pretty good at small self-contained changes like tweaks or new minor features.
This makes it easier to modify open source programs, but also means that if a program isn't open source, I can't make these changes at all. Before, I wasn't going to make the change anyway, but now that I actually can, the ability to make changes (i.e. the program is open source) becomes much more important.
>The filesystem is a great tool to get around the lack of memory and state in LLMs and should be used more often.
This feels a bit like rediscovering stateless programming. Obviously the filesystem contents can actually change, but the idea of an idempotent result when running the same AI with the same command(s) and getting the same result would be lovely. Even better if the answer is right.
If it's consistent one way or the other it would be great: consistently wrong, correct it, consistently right, reward it. It's the unpredictability and inconsistency that's a problem.
No mention/comparison to Gemini CLI? Gemini CLI is awesome and they just added a kind of stealth feature for Chrome automation. This capability was first announced as Project Mariner, and teased for eventual rollout in Chrome, but it's available right now for free in Gemini CLI.
In my experience of trying to do things with gemini cli and claude code, claude code was always significantly smarter. gemini cli makes so many mistakes and then tries hard to fix them (in my applications at least).
Tbf I haven't played much with it, but I have generally found that I don't like the permission model on Gemini CLI or Codex anywhere near as much as Claude Code.
I implore people who are willing and able to send the contents and indices of their private notes repository to cloud based services to rethink their life decisions.
Not around privacy, mind you. If your notes contain nothing that you wouldn’t mind being subpoenaed or read warrantlessly by the DHS/FBI, then you are wasting your one and only life.
Yeah absolutely, being so close to the filesystem gets Claude Code the closest experience I've had with an agent that can actually get things done. Really all the years of UIs we've created for each other just get in the way of these systems, and on a broader scale it will probably be more important than ever to have a reasonable API in your apps.
My experience has been the opposite — a shell prompt is too many degrees of freedom for an LLM, and it consistently misses important information.
I’ve had much better luck with constrained, structure tools that give me control over exactly how the tools behave and what context is visible to the LLM.
It seems to be all about making doing the correct thing easy, the hard things possible, and the wrong things very difficult.
> Anyone who can't find use cases for LLMs isn't trying hard enough
That's an interesting viewpoint from an AI marketing company.
I think the essential job of marketing is to help people make the connection between their problems and your solutions. Putting all on them in a kind of blamey way doesn't seem like a great approach to me.
I read the whole thing and could still not figure out what they’re trying to solve. Which I’m pretty sure goes against the Unix philosophy. The one thing should be clearly defined to be able to declare that you solve it well.
I read the title, I read the article and there’s nothing in the article that supports the claim made in the title.
Also about a tool being overly conplex. Something like find, imagemagick, ffmpeg,… are not complex in themselves. They’re solving a domain that itself is complex. But the tools are quite good the evidence is their stability where they’ve barely changed across decades.
and yet the tools are still difficult to use. I could Read The Fine Manual, web search, stackoverflow, post a question on a Bulletin Board, or ask the Generative Artificial Inference robot. A lot of this seems like our user interface preferences. For example, my preference is that I just intuitively know that -i followed by a filepath is the input file but why can't I just drag the video icon onto ffmpeg? What might be obvious to me is not necessarily exposed functionality that someone else can see right away.
What you’re asking is the equivalent of “Why can’t I just press a button and have a plane takeoff, fly, and land by itself”. You can have a plane that does that, but only in a limited context. To program the whole decision tree for all cases is not economical (or feasible?).
ffmpeg does all things media conversion. If you don’t want to learn how to use it, you find someone that does (or do the LLM gamble) or try to find a wrapper that have a simpler interface and hope the limited feature set encompasses your use cases.
A cli tool can be extremely versatile. GUI is full of accidental complexities, so unless your selling point is intuitiveness, it’s just extra work.
What you’re solving with Claude Code. All I could gather was … something with your notes. Would you mind clearly stating 2-5 specific problems that
you use Claude Code to solve with your notes?
I love obsidian for the same basic reason you do: it’s just a bunch of text files, so I can use terminal tools and write programs to do stuff with them.
So far I mostly use LLMs to write the tools themselves, but not actually edit the notes. Maybe I can steal some of your ideas!
That's fair. But it's what I believe. I spend a lot of time inside giant companies and there are too many people waiting for stone tablets to come down the mountain with their use cases instead of just playing with this stuff.
I think it's the pengiun approach to risk management -- they know they need to jump in the water to get where they need to go, but they don't know where the orcas are. So they jostle closer and closer to the edge, some fall in, and the rest see what happens.
BTW, I probably shouldn't have only commenting on the small part at the end that annoyed me. I'm fascinated by the idea that LLMs make highly custom software feasible, like your "claudsidian" system... that people will be able to get the software they want by describing it rather than being limited to finding something preexisting and having to adapting to it. As you point out, the unix philosophy is one way -- simple, unopinionated, building blocks an LLM can compose out of user-level prompts.
> I think it's the pengiun approach to risk management -- they know they need to jump in the water to get where they need to go, but they don't know where the orcas are. So they jostle closer and closer to the edge, some fall in, and the rest see what happens.
Great way to describe the culture of fear prevalent at large companies.
I've seen a bunch of big companies have edicts sent down from the top, "all employees should be using LLMs, if you're not then you're not doing your job". But many employees just don't have much that it applies to. Like, I spend a huge amount of time reviewing PRs. (Somebody, who actually knows stuff, has to do it.) Some of the more data-sci guys have added LLM review bots to some repos, but they're rather dumb and useless.
(And then the CISO sends some security tips email/slack announcement which is still dumb and useless even after an LLM added a bunch of emojis and fun language to it.)
I've always been an old-fashioned and slow developer. But it still seems to me, if most "regular" "average" developers churn out code that is more of a liability than an asset, if they can do that 10x faster, it doesn't really change the world. Most stuff still ends up waiting, in the end, for some slow work done right, or else gets thrown away soon enough.
I think that a lot of very basic LLM use cases come down to articulating your need. If you're not the sort of person who's highly articulate, this is likely to be difficult for you.
I'm personally in the habit of answering even slightly complex questions by first establishing shared context - that is, I very carefully ensure that my conversational partner has exactly the same understanding of the situation that I do. I do this because it's frequently the case that we don't have a lot of overlap in our understanding, or we have very specific gaps or contradictions in our understanding.
If you're like many in this industry, you're working in a language other than what you were raised in, making all of this more difficult.
No, that’s not what it means. You can read what I’ve written about this for the last three years and I’ve been very consistent. In the enterprise too many people are waiting to be told things and whether it’s good for my business or not I’d rather be honest about how I feel (you need to just use this stuff).
I'm trying to engage with you on this, but I'm really not sure what you're getting at. You originally stated "I think the essential job of marketing is to help people make the connection between their problems and your solutions. Putting all on them in a kind of blamey way doesn't seem like a great approach to me."
I agree that's the job of marketing, but I'm not someone who markets AI, I'm someone who helps large marketing organizations use it effectively. I agree that if my goal was to market it that wouldn't be an effective message, but my goal is for folks who work in these companies to take some accountability for their own personal development, so that's my message. Again, all I can do is be honest about how I feel and to be consistent in my beliefs and experiences working with these kinds of organizations.
I agree, I was very skeptical until I started playing around with these tools and repeatedly got good results with almost no real effort.
Online discussion with randos about this topic is almost useless because everybody is quick to dismiss the other side as hopelessly brainwashed by hype, or burying their heads in the sand for fear of the future of their jobs. I've had much better luck talking about it with people I've known and had mutual respect with before all this stuff came out.
I disagree actually. Saying things like “everyone else managed to figure it out” is a way of creating FOMO. It might not be the way you want to do it, marketing doesn’t have to be nice (or even right) to work.
I’m responding to a comment that talks about whether that quote is good marketing so I’m just talking specifically about whether it might work from a marketing point of view.
I probably wouldn’t do it myself either, but that’s not really relevant to whether it works or not.
True, but you are arguing about the merit of the actual product, which neither I nor the comment I responded to were talking about at all. Marketing tactics can be applied to good and bad products, and FOMO is a pretty common one everywhere, from "limited remaining" to "early adopters lock in at $10/mo for life" to "everyone else is doing it".
No, I am not arguing about the merits of the product, I am explicitly saying that using FOMO as a marketing tactic is shitty and bad and should make a person who does that feel bad.
I do not care that it is common. I want it to be not common.
I do not care that bad marketing tactics like this can be used to sell "good" products, whatever that means.
You're supposed to start with a use case that is unmet, and research/build technology to enable and solve the use case.
AI companies are instead starting with a specific technology, and then desperately searching for use cases that might somehow motivate people to use that technology. Now these guys are further arguing that it should be the user's problem to find use cases for the technology they seem utterly convinced needs to be developed.
Is "finding a way to remove them, with prejudice, from my phone" a valid use case for them? I'm tired of Gemini randomly starting up.
(Well, I recently found there is a reason for it: I'm left handed and unlocking my phone with my left hand sometimes touch the icon stupidly put by default on the lock screen. Not that it would work: My phone is usually running with data disabled.)
There's something deeply hypocritical about a blog that criticizes the "SaaS Industrial Complex"[1], while at the same time praising one of the biggest SaaS in existence, while also promoting their own "AI-first" strategy and marketing company.
What even is this? Is it all AI slop? All of these articles are borderline nonsensical, in that weird dreamy tone that all AI slop has.
To see this waxing poetic about the Unix philosophy, which couldn't be farther from the modern "AI" workflow, is... something I can't quite articulate, but let's go with "all shades of wrong". Seeing it on the front page of HN is depressing.
Well, no, they aren't, but the orchestration frameworks in which they are embedded sometimes are (though a lot of times a whole lot of that everything is actually done by separate binaries the framework is made aware of via some configuration or discovery mechanism.)
It's more like a fluid/shapeless orchestrator that fuzzily interfaces between human and machine language, arising momentarily from a vat to take the exact form that fits the desired function, then disintegrates until called upon again.
sure, but that's not what we're talking about here.
the article is framing LLM's as a kind of fuzzy pipe that can automatically connect lots of tools really well. This ability works particularly well with unix-philosophy do-one-thing tools, and so being able to access such tools opens a superpower that is unique and secretly shiny about claudecode that browser-based chatgpt doesn't have.
How is codex now compared to Claude code? Especially with gpt 5 high for planning and codex for the coding part.
You know how people used to say the CLI is dead?
Now, due to tools like claude code, CLI is actually clearly the superior interface.
(At least for now)
It's not supposed to be an us vs them flamewar, of course. But it's fun to see a reversal like this from time to time!
I don't remember any advanced computer user, including developers saying that the CLI is dead.
The CLI has been dead for end-users since computers became powerful enough for GUIs, but the CLI has always been there behind the scenes. The closest we have been to the "CLI is dead" mentality was maybe in the late 90s, with pre-OSX MacOS and Windows, but then OSX gave us a proper Unix shell, Windows gave us PowerShell, and Linux and its shell came to dominate the server market.
> I don't remember any advanced computer user, including developers saying that the CLI is dead.
Obviously not around during the 90's when the GUI was blowing up thanks to Windows displacing costly commercial Unix machines (Sun, SGI, HP, etc.) By 2000 people were saying Unix was dead and the GUI was the superior interface to a computer. Visual Basic was magic to a lot of people and so many programs were GUI things even if they didn't need to be. Then the web happened and the tables turned.
That is a bit of a simplification, many users found value in wysiwyg, there was an aborted low-code visual programming movement.
Microsoft drank early OOP koolaid and thus powershell suffered from problems that were well covered by the time etc…
Ray Norda being pushed out after WordPerfect bought Novell with their own money and leveraged local religious politics in addition to typical corporate politics killed it.
Intel convinced major UNIX companies to drop their CPUs for IA-64 which was never delivered, mainly because the core decision was incompatible with the fundamental limitations of computation etc…
The rise of Linux, VMs and ultimately the cloud all depended on the CLI.
Add in Microsoft anticompetitive behavior plus everything else and you ended up with a dominant GUI os provider with a CLI that most developers found challenging to use.
I worked at some of the larger companies with large windows server installations and everyone of them installed Cygwin to gain access to tools that allowed for maintainable configuration management tools.
There are situations like WordPerfect which had GUI offerings be delayed due to the same problem that still plague big projects today, but by the time the web appeared Microsoft had used both brilliant and extremely unethical practices to gain market dominance.
The rise of technology that helped with graphics like vesa local bus and GPUs in the PC space that finally killed the remaining workstation vendors was actually concurrent with the rise of the web.
Even with that major companies like SGI mainly failed because they dedicated so many resources to low end offerings that they lost their competitiveness on the high end, especially as they fell into Intels trap with Itanium too.
But even that is complicated way beyond what I mentioned above.
> OSX gave us a proper Unix shell
BSD/Mach gave us that, OSX just included it in their operating system.
I think it might loop back around pretty quick. I've been using it to write custom GUI interfaces to streamline how I use the computer, I'm working piecemeal towards and entire desktop environment custom made to my own quirky preferences. In the past a big part of the reason I used the terminal so often for basic things was general frustration and discomfort using the mainstream GUI tools, but that's rapidly changing for me.
I do really like the Unix approach Claude Code takes, because it makes it really easy to create other Unix-like tools and have Claude use them with basically no integration overhead. Just give it the man page for your tool and it'll use it adeptly with no MCP or custom tool definition nonsense. I built a tool that lets Claude use the browser and Claude never has an issue using it.
The light switch moment for me is when I realized I can tell claude to use linters instead of telling it to look for problems itself. The later generally works but having it call tools is way more efficient. I didn't even tell it what linters to use, I asked it for suggestions and it gave me about a dozen of suggestions, I installed them and it started using them without further instruction.
I had tried coding with ChatGPT a year or so ago and the effort needed to get anything useful out of it greatly exceeded any benifit, so I went into CC with low expectations, but have been blown away.
As an extension of this idea: for some tasks, rather than asking Claude Code to do a thing, you can often get better results from asking Claude Code to write and run a script to do the thing.
Example: read this log file and extract XYZ from it and show me a table of the results. Instead of having the agent read in the whole log file into the context and try to process it with raw LLM attention, you can get it to read in a sample and then write a script to process the whole thing. This works particularly well when you want to do something with math, like compute a mean or a median. LLMs are bad at doing math on their own, and good at writing scripts to do math for them.
A lot of interesting techniques become possible when you have an agent that can write quick scripts or CLI tools for you, on the fly, and run them as well.
It's a bit annoying that you have to tell it to do it, though. Humans (or at least programmers) "build the tools to solve the problem" so intuitively and automatically when the problem starts to "feel hard", that it doesn't often occur to the average programmer that LLMs don't think like this.
When you tell an LLM to check the code for errors, the LLM could simply "realize" that the problem is complex enough to warrant building [or finding+configuring] an appropriate tool to solve the problem, and so start doing that... but instead, even for the hardest problems, the LLM will try to brute-force a solution just by "staring at the code really hard."
(To quote a certain cartoon squirrel, "that trick never works!" And to paraphrase the LLM's predictable response, "this time for sure!")
I've noticed Claude doing this for most tasks without even asking it to. Maybe a recent thing?
I have a Just task that runs linters (ruff and pyright, in my case), formatter, tests and pre-commit hooks, and have Claude run it every time it thinks it's done with a change. It's good enough that when the checks pass, it's usually complete.
This is the best way to approach it but if I had a dollar for each time Claude ran “—no-verify” on the git commits it was doing I’d have 10’s of dollars.
Doesn’t matter if you tell it multiple times in CLAUDE.md to not skip checks, it will eventually just skip them so it can commit. It’s infuriating.
I hope that as CC evolves there is a better way to tell/force the model to do things like that (linters, formatters, unit/e2e tests, etc).
A tip for everyone doing this: pipe the linters' stdout to /dev/null to save on tokens.
Why? The agent needs the error messages from the linters to know what to do.
If you're running linters for formatting etc, just get the agent to run them on autocorrect and it doesn't need to know the status as urgently.
The lightbulb moment for me was to have it make me a smoke test and to tell to run the test and fix issues (with the code it generated) until it passes. iterate over all features in the Todo.md (that I asked it to make). Claude code will go off and do stuff for I dunno, hours?, while I work on something else.
Hours? Not in my experience. It will do a handful of tasks then say “Great! I’ve finished a block of tasks” and stop. and honestly, you’re gonna want to check its work periodically. You can’t even trust it to run litters and unit test reliably. I’ve lost count of how many times it’s skipped pre-commit checks or committed code with failing tests because it just gives up.
genius i gotta try this
I've done exactly this with MCP { "name": "unshare_exec", "description": "Run a binary in isolated Linux namespaces using unshare", "inputSchema": { "type": "object", "properties": { "binary": {"type": "string"}, "args": {"type": "array", "items": {"type": "string"}} }, "required": ["binary"], "additionalProperties": false } }
It started as unshare and ended up being a bit of a yakshaving endeavor to make things work but i was able to get some surprisingly good results using gemma3 locally and giving it access to run arbitrary debian based utilities.
Would you be willing to share the sweater? Or the now-naked yak?
I'm curious to see what you've come up with. My local LLM experience has been... sub-par in most cases.
Unix tools let agents act & observe in many versatile ways. That lets them close their loops. Taking yourself out of the loop lets your agent work far more efficiently.
But anything you can do on the CLI, so can an agent. It’s the same thing as chefs preferring to work with sharp knives.
All GUI apps are different, each being unhappy in its own way. Moated fiefdoms they are, scattered within the boundaries of their operating system. CLI is a common ground, an integration plaza where the peers meet, streams flow and signals are exchanged. No commitment needs to be made to enter this information bazaar. The closest analog in the GUI world is Smalltalk, but again - you need to pledge your allegiance before entering one.
A few days ago I read an article from humnanlayer. They mentioned shipping a weeks worth of collaborative work in less than a day. That was one data point on a project.
- Has anyone found claude code been able to documentation for parts of the code which does not:
(a). Explode in maintenance time exponentially to help claude understand and iterate without falling over/hallucinating/design poorly?
(b). Use it to make code reviewers life easy? If so how?
I think the key issue for me is the time the human takes to *verify*/*maintain* plans is not much less than what it might take them to come up with a plan that is detailed enough that many AI models could easily implement.
It is pretty tiresome with the hype tweets and not being able to judge the vibe code cruft and demoware factor.
Especially on bootstrap/setup, AIs are fantastic for cutting out massive amounts of time, which is a huge boon for our profession. But core logic? I think that's where the not-really-saving-time studies are coming from.
I'm surprised there aren't faux academic B-school productivity studies coming out to counter that (sponsored by AI funding of course) already, but then again I don't read B-school journals.
I actually wonder if the halflife decay of the critical mass of vibecode will almost perfectly coincide with the crash/vroosh of labor leaving the profession to clean it up. It might be a mini-y2k event, without such a dramatic single day.
Let's do this. But entirely local. Local obsidian, local LLM, and all open source. That's the future I want.
Local Org mode, local LLM, all orchestrated with Emacs, all free software.
If only I were retired and had infinite time!
Open-weights only are also not enough, we need control of the dataset and training pipeline.
The average user like me wouldn't be able to run pipelines without serious infrastructure, but it's very important to understand how the data is used and how the models are trained, so that we own the model and can assess its biases openly.
Good luck understanding the biases in a petabyte of text and images and video, or whatever the training set is.
Do you disagree it's important to have access to the data, ease of assessment notwithstanding?
I view it as more or less irrelevant. LLMs are fundamentally black boxes. Whether you run the black box locally or use it remotely, whether you train it yourself or use a pretrained version, whether you have access to the training set or not, it's completely irrelevant to control. Using an LLM means giving up control and understanding of the process. Whether it's OpenAI or the training data-guided algorithm that controls the process, it's still not you.
Now, running local models instead of using them as a SaaS has a clear purpose: the price of your local model won't suddenly increase ten fold once you start depending on it, like the SaaS models might. Any level of control beyond that is illusory with LLMs.
I on the other hand think it's irrelevant if a technology is a blackbox or not. If it's supposed to fit the opensource/FOSS model of the original post having access to precursors is just as important as having access to the weights.
It's fine for models to have open-weights and closed data. It's only barely fitting the opensource model IMHO though.
The point of FOSS is control. You want to have access to the source, including build instructions and everything, in order to be able to meaningfully change the program, and understand what it actually does (or pay an expert to do this for you). You also want to make sure that the company that made this doesn't have a monopoly on fixing it for you, so that they can't ask you for exorbitant sums to address an issue you have.
An open weight model addresses the second part of THIS, but not the first. However, even an open weight model with all of the training data available doesn't fix the first problem. Even if you somehow got access to enough hardware to train your own GPT-5 based on the published data, you still couldn't meaningfully fix an issue you have with it, not even if you hired Ilya Sutskever and Yann LeCun to do it for you: these are black boxes that no one can actually understand at the level of a program or device.
It is an interesting question. Of course everyone should have equal access to the data in theory, but I also believe nobody should be forced to offer it for free to others and I don't think I want to spend tax money having the government host and distribute that data.
I'm not sure how everyone can have access to the data without necessitating another taking on the burden of providing it.
I think torrent is a very good way to redistribute this type of data. You can even selectively sync and redistribute.
I'm also not saying anyone should be forced to disclose training data. I'm only staying that a LLM that's only openweight and not open data/pipeline barely fits the opensource model of the stack mentioned by OP.
LLMs are making open source programs both more viable and more valuable.
I have many programs I use that I wish were a little different, but even if they were open source, it would take a while to acquaint myself with the source code organization to make these changes. LLMs, on the other hand, are pretty good at small self-contained changes like tweaks or new minor features.
This makes it easier to modify open source programs, but also means that if a program isn't open source, I can't make these changes at all. Before, I wasn't going to make the change anyway, but now that I actually can, the ability to make changes (i.e. the program is open source) becomes much more important.
Maybe this is of interest https://laurentcazanove.com/blog/obsidian-rag-api
Is local not infeasible for models of useful size (at least on a typical dev machine with <= 64GB RAM and a single GPU)
Seems like this might be possible with opencode? Haven't played much.
Apple then.
That rules out the open source part.
Agreed - but don't WindSurf, Cursor and Copilot do all the same things now, but with choice of LLM & IDE integration?
>The filesystem is a great tool to get around the lack of memory and state in LLMs and should be used more often.
This feels a bit like rediscovering stateless programming. Obviously the filesystem contents can actually change, but the idea of an idempotent result when running the same AI with the same command(s) and getting the same result would be lovely. Even better if the answer is right.
If it's consistent one way or the other it would be great: consistently wrong, correct it, consistently right, reward it. It's the unpredictability and inconsistency that's a problem.
No mention/comparison to Gemini CLI? Gemini CLI is awesome and they just added a kind of stealth feature for Chrome automation. This capability was first announced as Project Mariner, and teased for eventual rollout in Chrome, but it's available right now for free in Gemini CLI.
In my experience of trying to do things with gemini cli and claude code, claude code was always significantly smarter. gemini cli makes so many mistakes and then tries hard to fix them (in my applications at least).
Tbf I haven't played much with it, but I have generally found that I don't like the permission model on Gemini CLI or Codex anywhere near as much as Claude Code.
This more like … let’s change the way we code so LLMs and AI coding assist can reduce the error rate and improve reliability
I implore people who are willing and able to send the contents and indices of their private notes repository to cloud based services to rethink their life decisions.
Not around privacy, mind you. If your notes contain nothing that you wouldn’t mind being subpoenaed or read warrantlessly by the DHS/FBI, then you are wasting your one and only life.
So your goal in your one and only life is to write notes in your code repo that you don't want subpoenaed?
Another everything-is-new-again: https://github.com/steveyegge/efrit is Steve Yegge's drive-emacs-with-LLMs (I saw this mentioned via a video link elsewhere: https://www.youtube.com/watch?v=ZJUyVVFOXOc )
Yeah absolutely, being so close to the filesystem gets Claude Code the closest experience I've had with an agent that can actually get things done. Really all the years of UIs we've created for each other just get in the way of these systems, and on a broader scale it will probably be more important than ever to have a reasonable API in your apps.
My experience has been the opposite — a shell prompt is too many degrees of freedom for an LLM, and it consistently misses important information.
I’ve had much better luck with constrained, structure tools that give me control over exactly how the tools behave and what context is visible to the LLM.
It seems to be all about making doing the correct thing easy, the hard things possible, and the wrong things very difficult.
> Anyone who can't find use cases for LLMs isn't trying hard enough
That's an interesting viewpoint from an AI marketing company.
I think the essential job of marketing is to help people make the connection between their problems and your solutions. Putting all on them in a kind of blamey way doesn't seem like a great approach to me.
I read the whole thing and could still not figure out what they’re trying to solve. Which I’m pretty sure goes against the Unix philosophy. The one thing should be clearly defined to be able to declare that you solve it well.
What the company is trying to solve or what I'm solving with Claude Code?
I read the title, I read the article and there’s nothing in the article that supports the claim made in the title.
Also about a tool being overly conplex. Something like find, imagemagick, ffmpeg,… are not complex in themselves. They’re solving a domain that itself is complex. But the tools are quite good the evidence is their stability where they’ve barely changed across decades.
This is my point. Those tools are great specifically because of the simplicity of how they expose their functionality.
and yet the tools are still difficult to use. I could Read The Fine Manual, web search, stackoverflow, post a question on a Bulletin Board, or ask the Generative Artificial Inference robot. A lot of this seems like our user interface preferences. For example, my preference is that I just intuitively know that -i followed by a filepath is the input file but why can't I just drag the video icon onto ffmpeg? What might be obvious to me is not necessarily exposed functionality that someone else can see right away.
What you’re asking is the equivalent of “Why can’t I just press a button and have a plane takeoff, fly, and land by itself”. You can have a plane that does that, but only in a limited context. To program the whole decision tree for all cases is not economical (or feasible?).
ffmpeg does all things media conversion. If you don’t want to learn how to use it, you find someone that does (or do the LLM gamble) or try to find a wrapper that have a simpler interface and hope the limited feature set encompasses your use cases.
A cli tool can be extremely versatile. GUI is full of accidental complexities, so unless your selling point is intuitiveness, it’s just extra work.
What you’re solving with Claude Code. All I could gather was … something with your notes. Would you mind clearly stating 2-5 specific problems that you use Claude Code to solve with your notes?
I was on a podcast last week where I went into a ton of detail: https://every.to/podcast/how-to-use-claude-code-as-a-thinkin...
Basically I have it sitting over the top of my notes and assisting with writing, editing, researching, etc.
Thanks, I’ll take a look.
I love obsidian for the same basic reason you do: it’s just a bunch of text files, so I can use terminal tools and write programs to do stuff with them.
So far I mostly use LLMs to write the tools themselves, but not actually edit the notes. Maybe I can steal some of your ideas!
I started a repo if you want to play: https://github.com/heyitsnoah/claudesidian
That's fair. But it's what I believe. I spend a lot of time inside giant companies and there are too many people waiting for stone tablets to come down the mountain with their use cases instead of just playing with this stuff.
I do understand about enterprise decision-making.
I think it's the pengiun approach to risk management -- they know they need to jump in the water to get where they need to go, but they don't know where the orcas are. So they jostle closer and closer to the edge, some fall in, and the rest see what happens.
BTW, I probably shouldn't have only commenting on the small part at the end that annoyed me. I'm fascinated by the idea that LLMs make highly custom software feasible, like your "claudsidian" system... that people will be able to get the software they want by describing it rather than being limited to finding something preexisting and having to adapting to it. As you point out, the unix philosophy is one way -- simple, unopinionated, building blocks an LLM can compose out of user-level prompts.
> I think it's the pengiun approach to risk management -- they know they need to jump in the water to get where they need to go, but they don't know where the orcas are. So they jostle closer and closer to the edge, some fall in, and the rest see what happens.
Great way to describe the culture of fear prevalent at large companies.
I've seen a bunch of big companies have edicts sent down from the top, "all employees should be using LLMs, if you're not then you're not doing your job". But many employees just don't have much that it applies to. Like, I spend a huge amount of time reviewing PRs. (Somebody, who actually knows stuff, has to do it.) Some of the more data-sci guys have added LLM review bots to some repos, but they're rather dumb and useless.
(And then the CISO sends some security tips email/slack announcement which is still dumb and useless even after an LLM added a bunch of emojis and fun language to it.)
I've always been an old-fashioned and slow developer. But it still seems to me, if most "regular" "average" developers churn out code that is more of a liability than an asset, if they can do that 10x faster, it doesn't really change the world. Most stuff still ends up waiting, in the end, for some slow work done right, or else gets thrown away soon enough.
I think that a lot of very basic LLM use cases come down to articulating your need. If you're not the sort of person who's highly articulate, this is likely to be difficult for you.
I'm personally in the habit of answering even slightly complex questions by first establishing shared context - that is, I very carefully ensure that my conversational partner has exactly the same understanding of the situation that I do. I do this because it's frequently the case that we don't have a lot of overlap in our understanding, or we have very specific gaps or contradictions in our understanding.
If you're like many in this industry, you're working in a language other than what you were raised in, making all of this more difficult.
> That's fair. But it's what I believe.
That response suggests you aren't interested in discussion or conversation at all.
It suggests that your purpose here is to advertise.
No, that’s not what it means. You can read what I’ve written about this for the last three years and I’ve been very consistent. In the enterprise too many people are waiting to be told things and whether it’s good for my business or not I’d rather be honest about how I feel (you need to just use this stuff).
>No, that’s not what it means.
That's fair but it's what I believe.
...see?
Being consistent with stating your beliefs isn't the same as engaging with and about those beliefs.
Advertising isn't conversation. Evangelism isn't discussion.
I'm trying to engage with you on this, but I'm really not sure what you're getting at. You originally stated "I think the essential job of marketing is to help people make the connection between their problems and your solutions. Putting all on them in a kind of blamey way doesn't seem like a great approach to me."
I agree that's the job of marketing, but I'm not someone who markets AI, I'm someone who helps large marketing organizations use it effectively. I agree that if my goal was to market it that wouldn't be an effective message, but my goal is for folks who work in these companies to take some accountability for their own personal development, so that's my message. Again, all I can do is be honest about how I feel and to be consistent in my beliefs and experiences working with these kinds of organizations.
That was somebody else that said that.
I agree, I was very skeptical until I started playing around with these tools and repeatedly got good results with almost no real effort.
Online discussion with randos about this topic is almost useless because everybody is quick to dismiss the other side as hopelessly brainwashed by hype, or burying their heads in the sand for fear of the future of their jobs. I've had much better luck talking about it with people I've known and had mutual respect with before all this stuff came out.
I disagree actually. Saying things like “everyone else managed to figure it out” is a way of creating FOMO. It might not be the way you want to do it, marketing doesn’t have to be nice (or even right) to work.
I don't want to work with people who think that's good marketing, or people who are convinced by it.
FOMO is for fashions and fads, not getting things done.
I’m responding to a comment that talks about whether that quote is good marketing so I’m just talking specifically about whether it might work from a marketing point of view.
I probably wouldn’t do it myself either, but that’s not really relevant to whether it works or not.
"Good marketing" doesn't have to mean "Marketing that is maximally effective"
Filling food with opioids would be great for business, but hopefully you understand how that is not "good business"
True, but you are arguing about the merit of the actual product, which neither I nor the comment I responded to were talking about at all. Marketing tactics can be applied to good and bad products, and FOMO is a pretty common one everywhere, from "limited remaining" to "early adopters lock in at $10/mo for life" to "everyone else is doing it".
No, I am not arguing about the merits of the product, I am explicitly saying that using FOMO as a marketing tactic is shitty and bad and should make a person who does that feel bad.
I do not care that it is common. I want it to be not common.
I do not care that bad marketing tactics like this can be used to sell "good" products, whatever that means.
It's a totally backwards way to build a product.
You're supposed to start with a use case that is unmet, and research/build technology to enable and solve the use case.
AI companies are instead starting with a specific technology, and then desperately searching for use cases that might somehow motivate people to use that technology. Now these guys are further arguing that it should be the user's problem to find use cases for the technology they seem utterly convinced needs to be developed.
Is "finding a way to remove them, with prejudice, from my phone" a valid use case for them? I'm tired of Gemini randomly starting up.
(Well, I recently found there is a reason for it: I'm left handed and unlocking my phone with my left hand sometimes touch the icon stupidly put by default on the lock screen. Not that it would work: My phone is usually running with data disabled.)
There's something deeply hypocritical about a blog that criticizes the "SaaS Industrial Complex"[1], while at the same time praising one of the biggest SaaS in existence, while also promoting their own "AI-first" strategy and marketing company.
What even is this? Is it all AI slop? All of these articles are borderline nonsensical, in that weird dreamy tone that all AI slop has.
To see this waxing poetic about the Unix philosophy, which couldn't be farther from the modern "AI" workflow, is... something I can't quite articulate, but let's go with "all shades of wrong". Seeing it on the front page of HN is depressing.
[1]: https://www.alephic.com/no-saas
LLMs are one large binary that does everything (maybe, if you are lucky today)
exact opposite of the unix philosophy
> LLMs are one large binary that does everything
Well, no, they aren't, but the orchestration frameworks in which they are embedded sometimes are (though a lot of times a whole lot of that everything is actually done by separate binaries the framework is made aware of via some configuration or discovery mechanism.)
It's more like a fluid/shapeless orchestrator that fuzzily interfaces between human and machine language, arising momentarily from a vat to take the exact form that fits the desired function, then disintegrates until called upon again.
sure, but that's not what we're talking about here.
the article is framing LLM's as a kind of fuzzy pipe that can automatically connect lots of tools really well. This ability works particularly well with unix-philosophy do-one-thing tools, and so being able to access such tools opens a superpower that is unique and secretly shiny about claudecode that browser-based chatgpt doesn't have.
wait until you find out about human programmers...
They’re one gigantic organic blob?
They write the individual tools that do the one specific thing?