I've known Erik for a while now — simply incredible founder. Doing this as a simple API proxy makes this practically effortless to integrate into existing systems, just a simple URL swap and you're good to go. Then, it's just a matter of watching the cache hit rate go up!
Exciting to see a product like this launch! There are obviously a host of ‘memory’ solutions out there that try to integrate in fancy ways to cache knowledge / save tokens, but I think there’s a beauty in simplicity to just having a proxy over the OpenAI endpoint.
An interesting alternative product to offer is injecting prompt cache tokens into requests where they could be helpful; not bypassing generations but at least low hanging fruit for cost savings
Computer use agents (as an RPA alternative) is the easiest example to reach to: UIs change but not often, so the "trajectory" of click and key entry tool calls is mostly fixed over time and worth feeding to the agent as a canned trajectory. I discuss the flaws of computer use and RPA in the blog above.
A counterexample is coding agents: it's a deeply user-interractive workflow reading from a codebase that's evolving. So the set of things the model is inferencing on is always different, and trajectories are never repeated.
Still not clear - the tool calls come from the model, so what is being cached by Muscle Memory?
Also:
After my time building computer-use agents, I’m convinced that the hybrid approach of Muscle Memory is the only viable way to offer 100% coverage on an RPA workload.
100% coverage of what?
I guess it'd be great if you could clarify the value proposition, many folks will be even less patient than myself.
logged back in to HN to comment on this. looks really sick - i've been saying for a while that a surprising amount of LLM inference really comes down to repetition down a known path.
it's good to see others have seen this problem and are working to make things more efficient. I'm excited to see where this goes.
Thanks! For langchain you can repoint your base_url in the client. Autogpt I'm not as familiar with. Closed loop robotics using LLMs may be a stretch for now, especially since vision is a heavy component, but theoretically the patterns baked into small language models running on-device or hosted LLMs at higher level planning loops, could be emulated by a butter cache if observed in high enough volume.
For AutoGPT, there is the option to set a llamafile endpoint, which follows the Chat Completions API. So, theoretically, you should be able to use that to point to Butter's LLM proxy.
I've known Erik for a while now — simply incredible founder. Doing this as a simple API proxy makes this practically effortless to integrate into existing systems, just a simple URL swap and you're good to go. Then, it's just a matter of watching the cache hit rate go up!
Exciting to see a product like this launch! There are obviously a host of ‘memory’ solutions out there that try to integrate in fancy ways to cache knowledge / save tokens, but I think there’s a beauty in simplicity to just having a proxy over the OpenAI endpoint.
Interested to see where this goes!
An interesting alternative product to offer is injecting prompt cache tokens into requests where they could be helpful; not bypassing generations but at least low hanging fruit for cost savings
Are you able to walk through a specific use case or example case in detail? I'm not yet totally grokking what Butter is going to do exactly.
I've got a blog on this from the launch of Muscle Mem, which should paint a better picture https://erikdunteman.com/blog/muscle-mem
Computer use agents (as an RPA alternative) is the easiest example to reach to: UIs change but not often, so the "trajectory" of click and key entry tool calls is mostly fixed over time and worth feeding to the agent as a canned trajectory. I discuss the flaws of computer use and RPA in the blog above.
A counterexample is coding agents: it's a deeply user-interractive workflow reading from a codebase that's evolving. So the set of things the model is inferencing on is always different, and trajectories are never repeated.
Hope this helps
Still not clear - the tool calls come from the model, so what is being cached by Muscle Memory?
Also:
100% coverage of what?I guess it'd be great if you could clarify the value proposition, many folks will be even less patient than myself.
Best of luck!
This is awesome, Erik! Excited to see this launch. Definitely fixes some issues we had while building pure CopyCat
logged back in to HN to comment on this. looks really sick - i've been saying for a while that a surprising amount of LLM inference really comes down to repetition down a known path.
it's good to see others have seen this problem and are working to make things more efficient. I'm excited to see where this goes.
looks pretty cool! How would you integrate this into production agent stacks like langchain, autogpt, even closed loop robotics?
Thanks! For langchain you can repoint your base_url in the client. Autogpt I'm not as familiar with. Closed loop robotics using LLMs may be a stretch for now, especially since vision is a heavy component, but theoretically the patterns baked into small language models running on-device or hosted LLMs at higher level planning loops, could be emulated by a butter cache if observed in high enough volume.
For AutoGPT, there is the option to set a llamafile endpoint, which follows the Chat Completions API. So, theoretically, you should be able to use that to point to Butter's LLM proxy.