Astro/Solid - Hacker News

$seba_dos1 a day ago

I don't trust anyone who sees the output of current generation of LLMs and thinks "I want that to be agentic!". It can be immensely useful, but it's only useful when it manages to make your own brain work more, not less.

[-]

$SequoiaHope a day ago

I’m finding agentic coding to be a fascinating tool. The output is a mess but it takes so little input to make something quite functional. I had an app that I wrote with a python GUI framework I didn’t quite like. ChatGPT rewrote it to use GTK and it is so much faster now. Later Claude added a browser mode where the app can be run via GTK or a browser tab. I have never written a GTK app in my life past some hello world text box.

The output is very problematic. It breaks itself all the time, makes the same mistakes multiple times, I have to retread my steps. I’m going to have it write tests so it can better tell what it’s breaking.

But being able to say “take this GTK app and add a web server and browser based mode” and it just kinda does it with minimal manual debugging is something remarkable. I don’t fully understand it, it is a new capability. I do robotics and I wish we had this for PCB design and mechanical CAD, but those will take much longer to solve. Still, I am eager to point Claude at my hand written python robotics stack from my last major project [1] and have it clean up and document what was a years long chaotic prototyping process with results I was reasonably happy with.

The current systems have flaws but if you look at where LLMs were five years ago and you see the potential value in fixing the flaws with agentic coding, it is easy to imagine that those flaws will be addressed. There will be higher level flaws and those will eventually be addressed, etc. Maybe not, but I’m quite curious to see where this goes, and what it means for engineering as a human being at these times.

[1] https://github.com/sequoia-hope/acorn-precision-farming-rove...

[-]

$seba_dos1 21 hours ago

It is fascinating and it absolutely excels at writing barely-working, problematic code, that yet somehow appears to run. This helps me a lot, as having a shitty code to fix makes my mind much more engaged than when I'm writing stuff from scratch, but making the model do more stuff autonomously rather than having me consciously review it at each step is only making it less useful, not more.

[-]

$danielbln 21 hours ago

I've noticed that the quality of the output can be improved dramatically, but it takes a lot of... work isn't the right word, prior knowledge, persistence and systems, maybe.

Implementation plans, intermediate bot/hunan review to control for complexity, convention adherence, actual task completion, then provide guidance, manage the context and a ton of other things to cage and harness the agent.

Then, what it produces, it almost passes the sniff test. Add further bot and human code review, and we've got something that passes muster.

The siren song of "just do it/fix it" is hard to avoid sometimes, especially as deadlines loom, but that way lies pain. Not a problem for a quick prototype or something throwaway (and OP is right, that that works at all is nothing short of marvelous), but to create output to be used in long term maintainable software a lot has to happen, and even that it's sometimes a crap shoot.

But why not do it by hand then? To me it still accelerates and opens up the possibility space tremendously.

Overall I'm bullish on agents improving past the current necessary operator-driven handholding sooner than later. Right now we have the largest collection of developer agent RL data ever, all the labs sucking up that juicy dev data. I think that will improve today's tooling tremendously.

[-]

$seba_dos1 20 hours ago

Yes, it requires prior understanding of what you're attempting to do. That's more or less what I meant by "making your own brain work more" - if you treat it as an input for your brain to operate on and exercise your knowledge, it can boost your productivity. If you treat it as a tool that lets you think less, you end up with nothing but slop. Sometimes even slop will be useful, but the contexts where this is true are limited.

I have no doubt that agents will become meaningfully useful for some things at some point. This just hasn't really happened yet, aside of the really simple stuff perhaps.

$falcor84 a day ago

As I see it, all of civilization is built on top of this "laziness" principle of "I'm tired of having to deal with X, how can I sort X so that I don't need to think about it anymore, and can instead focus on Y". I'm general, people want their brain to work on other stuff, not what's currently in front of them.

[-]

$seba_dos1 21 hours ago

...which is precisely why they end up with slop rather than increased productivity, as it's not a tool that's up for this task.

[-]

$falcor84 20 hours ago

On a related note, there are hundreds of millions of knowledge workers around the world who don't want to write code but do want to automate repetitive calculations across various domains. Over the last decades, they have created many billions of spreadsheets that are now the blood vessels of the global economy. Most of these spreadsheets are bug-ridden, particularly around edge conditions. Nevertheless, the economic "blood" keeps flowing, routing around errors and inefficiencies, and I've never heard anyone claim that we'd have better productivity if random Joe couldn't create a spreadsheet, and had to instead wait for the budget for a programmer to write code for them to do that (even if we had programmers writing bug-free code).

$adastra22 a day ago

Which is exactly what agentic tools do--I focus on making decisions, not carrying out the gruntwork actions.

[-]

$seba_dos1 21 hours ago

It's exactly what agentic tools make harder to do. LLM-generated code usually looks great at the first glance - your opinion on how bad it is is a function of effort spent reviewing, analyzing and questioning it.

The shitty code it comes up with helps me a lot, because fixing broken stuff and unraveling the first principles of why it's broken is how I learn best. It helps me think and stay focused. When learning new areas, the goal is to grasp the subject matter enough to find out what's wrong with the generated code - and it's still a pretty safe bet that it will be wrong, one way or another. Whenever I attempt to outsource the actual thinking (because of feeling lazy or even just to check the abilities of the model), the results are pretty bad and absolutely nowhere near the quality level of anything I'd want to sign my name under.

Of course, some people don't mind and end up wasting other people's time with their generated patches. It's not that hard to find them around. I have higher standards than replying "dunno, that's what AI wrote" when a reviewer asks why is something done this particular way. Agentic tools bring down the walls which could let you stop for a moment and notice the sloppiness of that output even further. They just let the model do more of the things it's not very good at, and encourage it to flood the code with yet another workaround for an issue that would disappear completely had you spent two minutes pondering about the root cause (which you won't do, because you don't have a mental model of what the code does, because you let the agent do it for you).

[-]

$adastra22 18 hours ago

I’m afraid this is a case of “you’re doing it wrong.”

I use Claude Code with a dozen different hand tuned subagent specs and a comprehensive CLAUDE.md specifying how to use them. I review every line of code before committing (turning off the auto commit was the very first instruction). It is now the case that it is able to make a full PR that needs no major changes. Often just one or two follow up tweak requests.

With subagents it can sometimes be running for an hour or more before it is done, but I don’t have to babysit it anymore.

[-]

$seba_dos1 18 hours ago

My experiences with various models make me very suspicious of what kind of code you end up with, but if it works for your particular needs then good for you. I couldn't make it work this way for mine.

$Topfi a day ago

ChatGPT agent has now been made available in the EU and all other supported countries and territories for Pro, Plus, Team, Enterprise, and Edu plans, as was to be expected [0].

[0] https://news.ycombinator.com/item?id=44596320

[-]

$andix a day ago

I have it available already for one or two weeks.

Tried it once and it really sucked.

[-]

$diggan a day ago

> I have it available already for one or two weeks.

I think I've had it available with the separate website ("research preview"?) for months, but yeah, last few weeks it's been directly in ChatGPT.com, and I'm within the EU.

$bl0rg a day ago

What did you try to use it for? But I agree, it feels kinda beta at the moment.

[-]

$FergusArgyll a day ago

I've been trying to find a good use case for it, I got one that I was happy with [0] but, yeah, it's not perfect

[0] https://chatgpt.com/share/68953a55-c5d8-8003-a817-663f565c6f...

[-]

$SillyUsername 20 hours ago

If this got some traction there would be a lot more use cases: https://community.openai.com/t/feature-ifttt-connector-in-ag...

$kaoD a day ago

If I reject cookies the page instant goes to "UH OH. SOMETHING WENT WRONG."

What is ChatGPT Agent?

[-]

$tempodox a day ago

Allergic to cookie denials. Fits with the fact that GPT-5 demands biometric ID from users.

https://news.ycombinator.com/item?id=44837367

[-]

$MaxikCZ 21 hours ago

Yea, I was looking forward to test new gpt5, but giving them my ID + scanning my face is not something I am willing to do to be allowed to give them my money.

$ivape a day ago

It's them trying to making products on top of AI. It's like the Apache server team making websites to show what a server is capable of (lol).

Just give us the API and stop trying OpenAI.

$baxtr a day ago

Can someone explain to me in simple terms what an agent is?

Is for example Google’s crawl bot an agent?

Is there a prominent successful agent that I could test myself?

So many questions…

[-]

$jeroenhd 21 hours ago

An agent as far as I've seen people use it is a script that will add some stuff to your prompt and monitor the LLM's output for a specific pattern and execute code when it encounters that.

For instance, you could have an "agent" that can read/edit files on your computer by adding something like "to read a file, issue the `read_file $path`" to your prompt, and whenever a line of LLM output that starts with `read_file` is finished, the script running on your computer will read that file, paste it into the prompt, and let the LLM continue its autocomplete-on-steroids.

If you write enough tools and a complicated enough prompt, you end up with an LLM that can do stuff. By default, smart tools usually require user confirmation before actually doing stuff, but if you run the LLM in full agent mode, you trust the LLM not to do anything it shouldn't. curl2bash with LLMs, basically.

An LLM with significant training and access to file access, HTTP(S) API access, and access to some OS APIs can do a lot of work for you if you prompt it right. My experience with Claude/Copilot/etc. is that 75% of the time, the LLM will fail to do what it should be doing without manually repairing its mistakes, but in the other 25% of the time it does look rather sci-fi-ish.

With some tools you can tell your computer "take this directory, examine the EXIF data of each image, map the coordinates to the country and nearest town the picture was taken in, then make directories for each town and move the pictures to their corresponding locations". The LLM will type out shell commands (`ls /some/directory`), interpret the results as part of the prompt response that your computer sends back, and repeat that until its task has been completed. If you prepare a specific prompt and set of tools for the purpose of managing files, you could call that a "file management agent".

Generally, this works best for things you can do by hand in a couple of minutes or maybe an hour if it's a big set of images, but something the computer can now probably take care of you for you. That said, you're basically spending enough CO2 to drive to the store and back, so until we get more energy efficient data centers I'm not too fond of using these tools for banal interactions like that.

$dragonwriter a day ago

An agent (in the way the term is commonly currently used around LLMs) is a combination of LLM, external tools, and management framework such that the total system can make (and make use of the results of) one or multiple multiple tool calls at the LLM direction without intervening user interaction to serve user needs. (Usually, in practice, this takes place in between the request and response in what is otherwise a typical chatbot-style interaction, though there are other possibilities.)

$bognition a day ago

Think of an agent as a standalone script or service. They have a single function take inputs and create outputs.

You can chain agents together into a string to accomplish larger tasks.

Think of everything involved in booking travel. You have set a budget, pick dates, chose a destinations, etc…. Each step can be defined as an agent and then you chain them together into a tool that handles the entire task for you.

$LPisGood a day ago

The way everyone is using the term lately is to refer to an LLM that can use one or more tools, calculators, search engines, etc

$koakuma-chan a day ago

An agent in this context is simply an LLM that has tools.

[-]

$XenophileJKO a day ago

There is an additional component. The LLM needs to determine when to use a tool and be capable of using more than one tool instance per logical task.

[-]

$diggan a day ago

That's pretty much implicit when someone says "LLM that has tools" (what they mean between the lines is "A LLM that been trained to do tool calling, and used with a runner that can parse whatever tool calling/response format the model is trained for"), what would they refer to otherwise? Just that there is a list of tools but the LLM isn't even considering using them, or can only use one?

[-]

$XenophileJKO 21 hours ago

Certainly, for example I have created products that use tools, but in a workflow. It is common to give an LLM a tool or a few tools and make calling one of the tools the primary task of the prompt.

Arranging these in a workflow to automate processes is common, but not agentic.

$ a day ago

[deleted]

$adastra22 a day ago

Claude Code / Cursor / Windsurf are agents. LLMs with tools.

$ a day ago

[deleted]

$thrance a day ago

A marketing buzzword for when you have multiple prompts.

$beacon294 21 hours ago

An agent is a while loop.

$IncreasePosts 21 hours ago

    prompt = user_input()
    while prompt != "exit":
      prompt = replace_tool_calls_with_results(call_llm(prompt))

$QuadmasterXLII a day ago

Agent originally meant an ai that made decisions to optimize some utility function. This was seen as a problem: we don’t know how to pick a good utility function or even how to point an ai at a specific utility function, so any agent that was smarter than us was as likely as not to turn us all into paperclips, or carve smiles into our faces, or some other grim outcome.

With LLMs, this went through two phases of shittifaction: first, there was a window where the safety people were hopeful about LLMs because the weren’t agents, so everyone and their mother declared that they would create an agent out if an LLM explicitly because they heard it was dangerous.

This pleased the VCs.

Second, they failed to satisfy the original definition, so they changed the definition of agent to the thing that they made and declared victory. This pleased the VCs

[-]

$adastra22 a day ago

"Agent" is a word with meaning that predates the LessWrong crowd. It is just an AI tool that performs actions to achieve its goal. That is all.

[-]

$QuadmasterXLII 20 hours ago

It had a meaning that predated the LessWrong crowd, but the LessWrong meaning had taken over pretty completely as of the GPT-4 paper, only to get swamped again by the new "agentic is good actually" wave. From the GPT-4 paper:

""" 2.9 Potential for Risky Emergent Behaviors Novel capabilities often emerge in more powerful models.[61, 62] Some that are particularly concerning are the ability to create and act on long-term plans,[63] to accrue power and resources (“power- seeking”),[64] and to exhibit behavior that is increasingly “agentic.”[65] Agentic in this context does not intend to humanize language models or refer to sentience but rather refers to systems characterized by ability to, e.g., accomplish goals which may not have been concretely specified and 54 which have not appeared in training; focus on achieving specific, quantifiable objectives; and do long-term planning. Some evidence already exists of such emergent behavior in models.[66, 67, 65] For most possible objectives, the best plans involve auxiliary power-seeking actions because this is inherently useful for furthering the objectives and avoiding changes or threats to them.19[68, 69] More specifically, power-seeking is optimal for most reward functions and many types of agents;[70, 71, 72] and there is evidence that existing models can identify power-seeking as an instrumentally useful strategy.[29] We are thus particularly interested in evaluating power-seeking behavior due to the high risks it could present.[73, 74]"""

[-]

$adastra22 18 hours ago

Maybe in some communities? Agent has been a standard term of art in computer science (even outside of AI) for half a century.

[-]

$seba_dos1 18 hours ago

Who remembers what Microsoft Agent was?

(many probably know it, but not necessarily under this name)

[-]

$adastra22 17 hours ago

Clippy?

[-]

$seba_dos1 16 hours ago

Yes - the technology behind such glorious things as Office Assistants, Windows Search Assistants or BonziBuddy. Included in Windows 2000 up to Vista, with roots in Microsoft Bob.

$Mordisquitos a day ago

In other words, VC-backed tech companies decided to weaken the definition of 'Torment Nexus' after they failed to create the Torment Nexus inspired by the classic sci-fi novel 'Don't Create the Torment Nexus'.

$_Algernon_ 21 hours ago

This isn't strictly speaking true. An agent is merely something that acts (on its environment). A simple reflex agent (eg. simple robot vacuum with only reflexive collision detection) are also agents, though they don't strictly speaking attempt to maximize a utility function.

Ref: Artificial Intelligence - A Modern Approach.

[-]

$baxtr 5 hours ago

Thanks to your comment I came across this article, which I think explains agents quite well. Some differences seem artificial, but it gets the point across.

Were you thinking along these lines?

https://medium.com/@tahirbalarabe2/five-types-of-ai-agents-e...

[-]

$_Algernon_ 5 hours ago

Yes. This is in essence the same taxonomy used in A Modern Approach.

$QuadmasterXLII 20 hours ago

"Agent" in the context of LLMs has always been pretty closely intertwined with advertising how dangerous they are (exciting!), as opposed to connecting to earlier research on reflexes. The first viral LLM agent, AutoGPT, had the breathless " (skull and crossbones emoji) Continuous Mode Run the AI without user authorisation, 100% automated. Continuous mode is not recommended. It is potentially dangerous and may cause your AI to run forever or carry out actions you would not usually authorise. Use at your own risk. (Warning emoji)" in its readme within a week of going live, and was forked into ChaosGPT a week later with the explicit goal of going rogue and killing everyone

[-]

$_Algernon_ 9 hours ago

I'm responding to this claim:

>Agent originally meant an ai that made decisions to optimize some utility function.

That's not what agents originally referred to, and I don't understand how your circling back to LLMs is relevant to the original definition of agent?

$kace91 a day ago

This has been available for a while in my EU account (?).

Can’t say much about the usage as I haven’t tried it yet.

$icapybara a day ago

Hard to be excited about ChatGPT Agent - Claude Code feels like the right form factor for an agent.

$SillyUsername a day ago

If you want agent to be genuinely useful please support this feature request to have a connector for IFTTT: https://community.openai.com/t/feature-ifttt-connector-in-ag...

ChatGPT Agent – EU Launch