MCP or CLI? There's a massive debate happening in the AI community around what's the best way to expose tools to agents. For curiosity, I spend the weekend doing an independent study and benchmarked Github CLI, MCP, MCP with Tool Search and MCP with Code Mode, all with real data and practical tasks.
Numbers are quite interesting. A few key findings -
1. Github MCP is 2–3x more expensive to use than CLI. Probably not surprising. There’s almost no practical reason to use their MCP except for some of the different handling of security
2. Tool Search saves upfront tokens but spends them on extra turns. Whether that trade-off pays depends on task complexity. ToolSearch also introduces a new failure mode due to imperfect search accuracy.
3. Code Mode is the cheapest way to use MCP, but still 2x more expensive than CLI, and it’s very slow. Code-mode also introduces a unique failure mode when the agent writes buggy code or poor error handling.
4. It's possible to push CLIs further towards higher success rate at lowest cost and latency with a principled design approach that treats agent ergonomics as first class concern. I detailed this in https://axi.md.
The concurrency finding is the most striking part — MCP forcing sequential API calls because of schema retransmission between calls is a structural penalty that compounds exactly on the multi-step tasks where you'd most want MCP's composition benefits. The ci_failure_investigation example makes it concrete: 15 turns vs 3, 12× cost difference.
Have you looked at whether client-side schema caching could recover that? If the agent doesn't re-transmit tool definitions on every turn, the sequential vs parallel gap should narrow significantly.
Technically MCP tools can be called in parallel as well, but it seems the agent is generally less likely to do so.
In the evaluation I did account for prompt caching, so the multi-turn penalty is already minimized, yet it was still significant enough to make the difference.
MCP or CLI? There's a massive debate happening in the AI community around what's the best way to expose tools to agents. For curiosity, I spend the weekend doing an independent study and benchmarked Github CLI, MCP, MCP with Tool Search and MCP with Code Mode, all with real data and practical tasks.
Numbers are quite interesting. A few key findings -
1. Github MCP is 2–3x more expensive to use than CLI. Probably not surprising. There’s almost no practical reason to use their MCP except for some of the different handling of security
2. Tool Search saves upfront tokens but spends them on extra turns. Whether that trade-off pays depends on task complexity. ToolSearch also introduces a new failure mode due to imperfect search accuracy.
3. Code Mode is the cheapest way to use MCP, but still 2x more expensive than CLI, and it’s very slow. Code-mode also introduces a unique failure mode when the agent writes buggy code or poor error handling.
4. It's possible to push CLIs further towards higher success rate at lowest cost and latency with a principled design approach that treats agent ergonomics as first class concern. I detailed this in https://axi.md.
Benchmark harness, results and the reference implementation of gh-axi are all open sourced at https://github.com/kunchenguid/axi.
The concurrency finding is the most striking part — MCP forcing sequential API calls because of schema retransmission between calls is a structural penalty that compounds exactly on the multi-step tasks where you'd most want MCP's composition benefits. The ci_failure_investigation example makes it concrete: 15 turns vs 3, 12× cost difference.
Have you looked at whether client-side schema caching could recover that? If the agent doesn't re-transmit tool definitions on every turn, the sequential vs parallel gap should narrow significantly.
Technically MCP tools can be called in parallel as well, but it seems the agent is generally less likely to do so.
In the evaluation I did account for prompt caching, so the multi-turn penalty is already minimized, yet it was still significant enough to make the difference.