Astro/Solid - Hacker News

$mfalcon 16 hours ago

You have to evaluate the llm responses. https://aunhumano.com/index.php/2025/09/03/on-evaluating-age...

$tucaz 2 days ago

Do what you are doing but dump the contents of tracing into an LLM agent (cowork, code, opencode, etc) and ask for it to take a first pass. It’ll at least narrow it down for you. Use a smart model and it should be helpful.

[-]

$terryjiang2020 2 days ago

Hmm, which model would be a smart one for this case? Or I just try the latest version of OpenAI/Gemini/Claude, then?

[-]

$tucaz a day ago

I love Claude Code but that can be expensive. If you are on a budget you can do K2.5 with OpenCode.

[-]

$topcmm 21 hours ago

Yeah, Claude's cost really adds up fast on multi-step traces. I haven't tried OpenCode yet, but I'll definitely give it a spin to save some API credits. Thanks!

$syumpx 2 days ago

multi step may be whats killing it. simply and let llm do the work

$BlueHotDog2 3 days ago

just releasing something in the direction. a git like for agents

$newzino 3 days ago

[dead]

$hifathom a day ago

[flagged]

Ask HN: How do you debug multi-step AI workflows when the output is wrong?