I see some tools like this that keep popping up (don't mean that in a bad way! it's clearly exciting and the README itself compares itself to similar tools). however, for coordination strategies like this, aren't you always having to use token-based pricing via some API Key? that's the largest think that holds me personally back from getting into something like these frameworks. With a claude code max plan, all my delegation and coordination has to be done within a session (between some agents) with persisted artifacts. Unless I'm missing something that has changed?
Perhaps it's all moot as the usage you get from a subscription plan will eventually no longer be subsidized. Also, I have to wonder about what layers of coordination done externally to a model can be persistently better than within tool coordination? Like, with an anthropic feature like agent teams, I feel like it might be tough to beat anthropic native coordination of various Claude sessions because they might have better internal tool and standards awareness, which makes feeling like plugging something like this more difficult unless one's goal is to plug something like this into an open source model.
Geniunely curious how other people are thinking about this!!
Edit: I actually see that this tool claims that it can run within your existing Claude Code subscription, so now I'm extra interested.
If you invoke Claude Code with --input-format stream-json --output-format stream-json, you can use it headlessly. I built a personal UI / orchestration framework around it. Most features are available, but not exactly all (e.g. there is no way to undo via this protocol, but you can still do it manually by terminating / editing the session file / resuming). Other agentic software has similar features (Codex uses JSON-RPC, Copilot CLI has ACP which is also based on JSON-RPC).
Right, runs entirely within your CC subscription, no API key. Three layers: Hooks inject behavioral rules, a local FastAPI server exposes 55 tools via MCP (task wall, meetings, memory, etc.), CC's native TeamCreate handles multi-agent coordination.
On whether external coordination can beat native — I think they're different jobs. CC will probably always be better at low-level stuff (tool routing, context management, agent communication). But CC doesn't have opinions about your project workflow. It won't say "you haven't checked the task wall in 15 minutes" or "this task failed twice, here's what went wrong last time." That's what the OS layer does.
For example, the meeting system: 8 templates (brainstorm, decision, review, council, debate, etc.) each with structured rounds — round 1 everyone states positions independently, round 2 cross-examination, round 3 converge on a decision. The system auto-selects the template from topic keywords ("architecture review" -> council template, "what should we build next" -> brainstorm). Without this, multi-agent discussions devolve into everyone agreeing with each other in one round.
Honest status though: the behavioral enforcement works ~90%+ of the time. Hard blocks are reliable (preventing untracked agents -> exit code 2, no bypass). Soft reminders get ignored ~5-10% when the Leader is busy. It's prompt injection into a probabilistic system — 100% isn't realistic.
The design problem I'm chewing on: CC Teams enforces flat hierarchy. But asking the Leader to manage the task wall, chair meetings, dispatch agents, AND follow its own rules creates overload — things slip. I'm exploring a supervisor middle layer that just monitors rule compliance, including the Leader's. Like a process auditor, not a manager.
Further out: a persistent R&D department doing condition-triggered research (competitor releases, ecosystem changes — not a dumb timer), then auto-convening meetings to discuss findings. The carrot that keeps the system always moving. But without strict boundaries this becomes bloatware — agents generating busywork to justify existence.
When does "autonomous" become "wasteful"? How do you audit an AI that's supposed to audit itself?
disclaimer: I work on a different project in the space but got excited by your comment
DeepSteve (deepsteve.com) has a similar premise: it spawns Claude Code processes and attaches terminals to them in a browser UI, so you can automate coordination in ways a regular terminal can’t: Spawning new agents from GitHub issues, coordinating tasks via inter-agent chat, modifying its own UI, terminals that fork themselves.
Re: native vs external orchestration, I think the external layer matters precisely because it doesn’t have to replicate traditional company hierarchies. I’m less interested in “AI org chart” setups like gstack (we don’t have to bring antiquated corporate hierarchies with us) and more in hackable, flat coordination where agents talk to each other via MCP and you decide the topology yourself.
I was intrigued and had a look at deepsteve.com, but I couldn't figure the website out. I'm guessing it won't give you any information about it until you install it?
Deepsteve is a node server that runs on your machine, so the website is designed to look like DeepSteve's UI. You really just access it at localhost:3000 in your browser, not via deepsteve.com
You could use something like GLM 5 which is very capable. You get APIKEY and you don't have to pay for tokens if you stay within generous limits. And if you exceed them it's many times cheaper than frontier models.
25 tasks dispatched today, specialized agents in parallel. QA agent caught a security bug and wrote regression tests unprompted. 467 tests, 55 MCP tools, 7 commits. It works for structured multi-agent project management.
Rough edges: Leader overload is real. Managing task wall + chairing structured meetings (8 templates, multi-round discussions) + dispatching agents + following its own rules — stuff slips. Considering a supervisor layer to monitor compliance.
I want to hear more feedback and ideas from anyone interested. Here's why I built this:
I've been using Claude Code daily for a while. Tried running multiple projects simultaneously — and quickly hit the wall where I was the bottleneck, not the AI. Every step needed me: correcting course, enforcing constraints, switching between projects to tell CC what to do next. "Go research this and write me a report." "Design a better architecture for this module." "Here's what happened last session, pick up where we left off." Over and over.
What I actually wanted: to hand off everything the AI can handle independently, and only get pulled in for decisions that affect project direction. Record those strategic questions, pause that thread, let me review when I have time. But don't stop working — pick up the next task, do the research, run the tests, organize your findings.
What pushed me over the edge was using Lobster (the browser automation agent). I had it register social media accounts, scrape topic data, analyze users, and publish content. It technically worked — but it took 2 hours, burned $12 in API costs, and I had to sit there the entire time giving feedback and corrections. The end result was fine, but the process was basically me babysitting an expensive intern.
That's when the economics clicked. I'm on the $200/month CC Max plan. Unless you're running high-intensity work non-stop, it's hard to burn through the allocation. The tokens are pre-paid. So the question flips: it's not "how do I minimize token usage" — it's "how do I maximize the value from tokens I'm already paying for?" If you can ensure the burned tokens produce real output, then the only questions are how fast can you burn and how much value comes out. From what I've seen, CC with sufficient permissions just does more per dollar than API-based agents. The answer was obvious: make the AI work more so I work less.
So I built this to be the lazy CEO's toolkit. I set the rules, design the structure, make the strategic calls. Everything else — task management, agent coordination, research, testing, meeting facilitation — goes to the AI. When it needs my input, I want it to come with a summary that's already been through multiple rounds of research and synthesized different perspectives. Not "what should I do next?" but "here's what we found, here are the tradeoffs, what's your call?"
It's not there yet. The Leader gets overloaded, soft rules get ignored sometimes, meetings need more refinement. But the direction is clear: reduce human interrupt-driven management to strategic decision-making only.
If you've hit similar friction with CC or other coding agents — what's the thing you most wish you could stop doing manually?
I see some tools like this that keep popping up (don't mean that in a bad way! it's clearly exciting and the README itself compares itself to similar tools). however, for coordination strategies like this, aren't you always having to use token-based pricing via some API Key? that's the largest think that holds me personally back from getting into something like these frameworks. With a claude code max plan, all my delegation and coordination has to be done within a session (between some agents) with persisted artifacts. Unless I'm missing something that has changed?
Perhaps it's all moot as the usage you get from a subscription plan will eventually no longer be subsidized. Also, I have to wonder about what layers of coordination done externally to a model can be persistently better than within tool coordination? Like, with an anthropic feature like agent teams, I feel like it might be tough to beat anthropic native coordination of various Claude sessions because they might have better internal tool and standards awareness, which makes feeling like plugging something like this more difficult unless one's goal is to plug something like this into an open source model.
Geniunely curious how other people are thinking about this!!
Edit: I actually see that this tool claims that it can run within your existing Claude Code subscription, so now I'm extra interested.
If you invoke Claude Code with --input-format stream-json --output-format stream-json, you can use it headlessly. I built a personal UI / orchestration framework around it. Most features are available, but not exactly all (e.g. there is no way to undo via this protocol, but you can still do it manually by terminating / editing the session file / resuming). Other agentic software has similar features (Codex uses JSON-RPC, Copilot CLI has ACP which is also based on JSON-RPC).
Right, runs entirely within your CC subscription, no API key. Three layers: Hooks inject behavioral rules, a local FastAPI server exposes 55 tools via MCP (task wall, meetings, memory, etc.), CC's native TeamCreate handles multi-agent coordination.
On whether external coordination can beat native — I think they're different jobs. CC will probably always be better at low-level stuff (tool routing, context management, agent communication). But CC doesn't have opinions about your project workflow. It won't say "you haven't checked the task wall in 15 minutes" or "this task failed twice, here's what went wrong last time." That's what the OS layer does.
For example, the meeting system: 8 templates (brainstorm, decision, review, council, debate, etc.) each with structured rounds — round 1 everyone states positions independently, round 2 cross-examination, round 3 converge on a decision. The system auto-selects the template from topic keywords ("architecture review" -> council template, "what should we build next" -> brainstorm). Without this, multi-agent discussions devolve into everyone agreeing with each other in one round.
Honest status though: the behavioral enforcement works ~90%+ of the time. Hard blocks are reliable (preventing untracked agents -> exit code 2, no bypass). Soft reminders get ignored ~5-10% when the Leader is busy. It's prompt injection into a probabilistic system — 100% isn't realistic.
The design problem I'm chewing on: CC Teams enforces flat hierarchy. But asking the Leader to manage the task wall, chair meetings, dispatch agents, AND follow its own rules creates overload — things slip. I'm exploring a supervisor middle layer that just monitors rule compliance, including the Leader's. Like a process auditor, not a manager.
Further out: a persistent R&D department doing condition-triggered research (competitor releases, ecosystem changes — not a dumb timer), then auto-convening meetings to discuss findings. The carrot that keeps the system always moving. But without strict boundaries this becomes bloatware — agents generating busywork to justify existence.
When does "autonomous" become "wasteful"? How do you audit an AI that's supposed to audit itself?
disclaimer: I work on a different project in the space but got excited by your comment
DeepSteve (deepsteve.com) has a similar premise: it spawns Claude Code processes and attaches terminals to them in a browser UI, so you can automate coordination in ways a regular terminal can’t: Spawning new agents from GitHub issues, coordinating tasks via inter-agent chat, modifying its own UI, terminals that fork themselves.
Re: native vs external orchestration, I think the external layer matters precisely because it doesn’t have to replicate traditional company hierarchies. I’m less interested in “AI org chart” setups like gstack (we don’t have to bring antiquated corporate hierarchies with us) and more in hackable, flat coordination where agents talk to each other via MCP and you decide the topology yourself.
I was intrigued and had a look at deepsteve.com, but I couldn't figure the website out. I'm guessing it won't give you any information about it until you install it?
Thanks for the feedback.
Deepsteve is a node server that runs on your machine, so the website is designed to look like DeepSteve's UI. You really just access it at localhost:3000 in your browser, not via deepsteve.com
But now I can see how that would be confusing.
You could use something like GLM 5 which is very capable. You get APIKEY and you don't have to pay for tokens if you stay within generous limits. And if you exceed them it's many times cheaper than frontier models.
Sort of feels like gastown enterprise edition
Does anyone have an example of input, output, and cost?
Interesting, what are the benefits and drawbacks you've found developing and using it yourself?
[dead]
Nifty, looks like the enterprise edition of OpenClaw, kinda. Also, it looks token hungry!
[dead]
But does it work? and well?
25 tasks dispatched today, specialized agents in parallel. QA agent caught a security bug and wrote regression tests unprompted. 467 tests, 55 MCP tools, 7 commits. It works for structured multi-agent project management.
Rough edges: Leader overload is real. Managing task wall + chairing structured meetings (8 templates, multi-round discussions) + dispatching agents + following its own rules — stuff slips. Considering a supervisor layer to monitor compliance.
It doesn't.
What did you try? What did it do?
Your know the answer to this question haha.
I want to hear more feedback and ideas from anyone interested. Here's why I built this:
I've been using Claude Code daily for a while. Tried running multiple projects simultaneously — and quickly hit the wall where I was the bottleneck, not the AI. Every step needed me: correcting course, enforcing constraints, switching between projects to tell CC what to do next. "Go research this and write me a report." "Design a better architecture for this module." "Here's what happened last session, pick up where we left off." Over and over.
What I actually wanted: to hand off everything the AI can handle independently, and only get pulled in for decisions that affect project direction. Record those strategic questions, pause that thread, let me review when I have time. But don't stop working — pick up the next task, do the research, run the tests, organize your findings.
What pushed me over the edge was using Lobster (the browser automation agent). I had it register social media accounts, scrape topic data, analyze users, and publish content. It technically worked — but it took 2 hours, burned $12 in API costs, and I had to sit there the entire time giving feedback and corrections. The end result was fine, but the process was basically me babysitting an expensive intern.
That's when the economics clicked. I'm on the $200/month CC Max plan. Unless you're running high-intensity work non-stop, it's hard to burn through the allocation. The tokens are pre-paid. So the question flips: it's not "how do I minimize token usage" — it's "how do I maximize the value from tokens I'm already paying for?" If you can ensure the burned tokens produce real output, then the only questions are how fast can you burn and how much value comes out. From what I've seen, CC with sufficient permissions just does more per dollar than API-based agents. The answer was obvious: make the AI work more so I work less.
So I built this to be the lazy CEO's toolkit. I set the rules, design the structure, make the strategic calls. Everything else — task management, agent coordination, research, testing, meeting facilitation — goes to the AI. When it needs my input, I want it to come with a summary that's already been through multiple rounds of research and synthesized different perspectives. Not "what should I do next?" but "here's what we found, here are the tradeoffs, what's your call?"
It's not there yet. The Leader gets overloaded, soft rules get ignored sometimes, meetings need more refinement. But the direction is clear: reduce human interrupt-driven management to strategic decision-making only.
If you've hit similar friction with CC or other coding agents — what's the thing you most wish you could stop doing manually?