Astro/Solid - Hacker News

$MorganGallant 8 hours ago

I've known Erik for a while now — simply incredible founder. Doing this as a simple API proxy makes this practically effortless to integrate into existing systems, just a simple URL swap and you're good to go. Then, it's just a matter of watching the cache hit rate go up!

$ketan_around 11 hours ago

Exciting to see a product like this launch! There are obviously a host of ‘memory’ solutions out there that try to integrate in fancy ways to cache knowledge / save tokens, but I think there’s a beauty in simplicity to just having a proxy over the OpenAI endpoint.

Interested to see where this goes!

[-]

$edunteman 10 hours ago

An interesting alternative product to offer is injecting prompt cache tokens into requests where they could be helpful; not bypassing generations but at least low hanging fruit for cost savings

$bigwheels 8 hours ago

Are you able to walk through a specific use case or example case in detail? I'm not yet totally grokking what Butter is going to do exactly.

[-]

$edunteman 7 hours ago

I've got a blog on this from the launch of Muscle Mem, which should paint a better picture https://erikdunteman.com/blog/muscle-mem

Computer use agents (as an RPA alternative) is the easiest example to reach to: UIs change but not often, so the "trajectory" of click and key entry tool calls is mostly fixed over time and worth feeding to the agent as a canned trajectory. I discuss the flaws of computer use and RPA in the blog above.

A counterexample is coding agents: it's a deeply user-interractive workflow reading from a codebase that's evolving. So the set of things the model is inferencing on is always different, and trajectories are never repeated.

Hope this helps

[-]

$bigwheels 6 hours ago

Still not clear - the tool calls come from the model, so what is being cached by Muscle Memory?

Also:

  After my time building computer-use agents, I’m convinced that the hybrid approach of Muscle Memory is the only viable way to offer 100% coverage on an RPA workload.

100% coverage of what?

I guess it'd be great if you could clarify the value proposition, many folks will be even less patient than myself.

Best of luck!

$zyadelgohary1 8 hours ago

This is awesome, Erik! Excited to see this launch. Definitely fixes some issues we had while building pure CopyCat

$samraaj 9 hours ago

logged back in to HN to comment on this. looks really sick - i've been saying for a while that a surprising amount of LLM inference really comes down to repetition down a known path.

it's good to see others have seen this problem and are working to make things more efficient. I'm excited to see where this goes.

$tsvoboda 10 hours ago

looks pretty cool! How would you integrate this into production agent stacks like langchain, autogpt, even closed loop robotics?

[-]

$edunteman 10 hours ago

Thanks! For langchain you can repoint your base_url in the client. Autogpt I'm not as familiar with. Closed loop robotics using LLMs may be a stretch for now, especially since vision is a heavy component, but theoretically the patterns baked into small language models running on-device or hosted LLMs at higher level planning loops, could be emulated by a butter cache if observed in high enough volume.

$raymondtana 10 hours ago

For AutoGPT, there is the option to set a llamafile endpoint, which follows the Chat Completions API. So, theoretically, you should be able to use that to point to Butter's LLM proxy.

Show HN: Butter, a muscle memory cache for LLMs