1. Interesting approach, but the pricing seems 1-2 orders of magnitude too expensive. For your example for slack, It contains 4 calls for an action. Pricing shows 100 dollars per 10k cals, so 1 cent per call. This means, for an agent that lets say does 4 actions, so and your examples show at least 3-4 api calls per action , it's already 12 cents? Similar tools like composio.dev have 200k calls for 29 dollars, so around 70x cheaper (both for the cheapest tier). Even with needing only 1 call for subsequent calls, 1 cent per single api call sounds wrong, at least for our use case it does not economic sense to pay 5-10 cents on top of llm costs on every user query.
Apologies if I'm missing something!
2. Could this not be replicated by others by just handmaking a fuzzy search tool on the tools? I think this is the approach that will win, even with rag for lets say 10k plus tools maybe in the future, but not sure how much differentiation this is in the long term, i've made this search tool myself a couple of times already
1. Hi! We noticed this for Strata as well so we are ONLY counting the execute_action in Strata. This means the fee is the same as using the traditional, flat approaches. In other words, the 3 previous calls does not cost you anything!
2. I think the main drawback of search method is like giving a human lots of tools/APIs but you can ONLY access them via a search interface. This feels weird and should be improved. For our approaches, the step by step methods allow you to see what categories/actions are available. We also provide a documentation search so that you get the best out of both worlds.
1. Oh okay great, maybe clarify it in the pricing page? That mcp server call means just execute. But its' still 10x more expensive right?
2. From what i understand it's just nested search right? It is not anything different, if you do flat or embedding search or fuzzy/agentic nested is a choice for sure, but Im just saying not sure how defensible this is, if all other mcp competitors or even users themselves put in a nested search tool
1. Sure, we will clarify that in the pricing page, thank you! As you can see from our evaluation, we are much better than official MCP servers. I think people care more about getting things done correctly. Otherwise, you waste all the model tokens for nothing. We do have enterprise deals where we can work out a customized pricing plan for your specific use case. Come talk to us if you are interested.
2. One VERY important distinction is that the model is doing the "search" here, not some RAG algorithm or vector database. Therefore, as the model becomes smarter, this approach will be more accurate as well.
1. I was not talking about official MCP servers, those are often even free. Im talking about pricing of other devtools for aggregating tools/mcp's. I think this is an obvious space to build i agree, i just worry about differentiation. Its a search space not as big as web search, or complex (order doesnt matter).
2. Yes, i see, this is what i meant by agentic search. Essentially is a tiny subagent, taking list of tools in and out the relevant ones. Still implementable in 5 mins. But i guess if the experience is very smooth enterprise might pay?
1. Yes I agree. To be honest we are a young company (as you can tell we are from YC X25), so we are still figuring out the pricing. But thank you for the feedback and sharing your thoughts.
2. Yes the idea is not complex once you understand it. But there are some nuances we found along the way and supporting more integrations are always important but requires engineering efforts. Thank you!
> A natural reaction is to design a dynamic action space—perhaps loading tools on demand using something RAG-like. We tried that in Manus too. But our experiments suggest a clear rule: unless absolutely necessary, avoid dynamically adding or removing tools mid-iteration. There are two main reasons for this:
> 1. In most LLMs, tool definitions live near the front of the context after serialization, typically before or after the system prompt. So any change will invalidate the KV-cache for all subsequent actions and observations.
> 2. When previous actions and observations still refer to tools that are no longer defined in the current context, the model gets confused. Without constrained decoding, this often leads to schema violations or hallucinated actions.
> To solve this while still improving action selection, Manus uses a context-aware state machine to manage tool availability. Rather than removing tools, it masks the token logits during decoding to prevent (or enforce) the selection of certain actions based on the current context.
Their findings on KV-cache invalidation are spot on for a single-context approach.
Strata's architecture is philosophically different. Instead of loading a large toolset and masking it, we guide the LLM through a multi-step dialogue. Each step (e.g., choosing an app, then a category) is a separate, very small, and cheap LLM call.
So, we trade one massive prompt for a few tiny ones. This avoids the KV-cache issue because the context for each decision is minimal, and it prevents model confusion because the agent only ever sees the tools relevant to its current step. It's a different path to the same goal: making the agent smarter by not overwhelming it. Thanks for the great link!
The attack surface for agents with MCP access grows exponentially with the number of tools. On the scale of thousands of tools, I think it's nearly impossible to understand potential interactions and risk. Do you have any innovative controls in place that would help a CISO get comfortable with a product like this in an enterprise context?
That's the critical question. The key is that Strata never exposes all tools to the agent at once. Our progressive guidance acts as a dynamic allowlist, so that the agent only "sees" the specific tools relevant to its immediate task. This fundamentally reduces the blast radius at each step.
We do provide a comprehensive audit trails for every action, giving a CISO a centralized control plane to manage and monitor agent capabilities, rather than an exponential risk. If you are interested, come talk to us!
Ideally when we are writing agents we need mcp to support auth, custom headers because by design when deploying for saas we need to pass around client params to be able to isolate client connections.
We do token optimisation and other smart stuff to save token money. Looking forward to try this as well if this solves similar problems as well
Thank you! Yes we do provide auth and support other remote MCP servers via our API : https://docs.klavis.ai/api-reference/strata/create. It indeed support custom headers. Feel free to give us a try or come talk to us!
The fact people are giving credentials to all these MCP tools keeps amazing me.
Ten years ago if you built a service that asked you for permissions to everything imaginable most people would keep well clear. I guess the closest was Beeper which wanted your social passwords but that was heavily criticized and never very popular.
Now you slap an AI label on it and you can't keep people away.
MCP is like the "app store" for LLMs. LLMs can only do so much by themselves. They need connectivity to pull in context or take actions. Just like how your phone without apps is pretty limited in how useful it is.
Sure, teams could build their own connectors via function calling if they're running agents, but that only gets you so far. MCPs promise universal interoperability.
Some teams, like Block, are using MCP as a protocol but generally building their own servers.
But the vast majority are just sifting through the varying quality of published servers out there.
Those who are getting MCP to work are in the minority right now. Most just aren't doing it or aren't doing it well.
But there are plenty of companies racing into this space to make this work for enterprises / solve the problems you rightfully bring up.
As others have said here, the cat is out of the bag, and it is not going back in. MCP has enough buy-in from the community that it's likely to just get better vs. go away.
Source/Bias disclaimer: I pivoted my company to work on an MCP platform to smooth out those rough edges. We had been building integration technology for years. When a technology came along that promised "documentation + invocation" in-band over the protocol, I quickly saw that this could solve the pain of integration we had suffered for years. No more reading documentation and building integrations. The capability negotiation is built into the protocol.
We also provide an open-source version for Strata so that you can have full control. You can self-host it on your own infrastructure, so your credentials never have to touch our servers.
Yeah I see what you mean. Many MCP clients has the ability to ask human for confirmation before a tool call is executed. In this way, you can check the tool call before it executes.
What do you propose they do? Because although something like Strata makes it easier, the reality is people are piling up MCP servers like they're free cupcakes. There's no getting the cat back in the box.
(I'm not in security so I genuinely don't know and am curious.)
We're keeping an unofficial allow list at work. Basically just major software companies only. Third party mcp servers at this point are basically just attack vectors. How do you even vet them continuously?
Honestly vetting MCP seems like a YC company in and of itself.
Looks really useful! Do you happen to have a gallery of apps using it? In particular, I'd like to see how desktop or mobile apps handle the oauth flows.
We are aware of this and working on it now. We are actually code complete on Microsoft Teams and Outlook. We will definitely launch it within the next week or so.
This is something I actually started to work on and put down because it wasn't exciting enough, but it's a legit product that fills a niche and congrats on the launch.
The biggest issue I found was getting agents to intelligently navigate the choose your own adventure of searching for the right tool. It amazes me that they're so good at coding when they're so bad at tool use in general. I'm sure your MCP responses were a fun bit of prompt engineering.
Haha yeah we did optimize it a lot before the launch!
Actually for us, our first prototype was pretty good! We are also surprised about that because it took us a day or so to build the the prototype (only for one integrations though). Then it took us another week to build another prototype for multiple integrations.
Is the goal to make a “universal MCP” that makes it easy to let MCP clients execute thousands of tools on a session by session basis? Or is it more focused on initial tool discovery and registration? If it’s the former, does the process add more latency between user taking action and tool getting executed?
Yes it is the former. The value comes from its progressive guidance during a task, not just in the initial setup.
As for latency, we optimized for that. For examples, Strata automatically uses a direct, flat approach for simple cases. And we use less tokens compared to official MCP servers as well, as shown in the benchmark.
We tested our approaches with several thousand tools and it is working pretty well. Also we provide API access as well, so any developer can use this, not just on Microsoft or VS Code.
How do you handle compliance questionnaires from companies that adhere to SOC2 guidelines? If I used Klavis how would I tell my clients which information I send to which external partners?
We are SOC2 compliant. So this will not be a problem. Come talk to us if you are interested in using this for your clients and we can work out the details.
As you can see from our examples, the main approach is not tool search. Instead, Strata guides your AI agent step by step, going from server to categories to actions to action details so that the model does not get overloaded. We actually have 1000+ tools for some of the integrations (e.g. GitHub) and this approach works better than traditional methods.
Think of it as a search engine vs. a file explorer. But we do provide documentation search as well. So you get the best out of the two worlds.
1. Interesting approach, but the pricing seems 1-2 orders of magnitude too expensive. For your example for slack, It contains 4 calls for an action. Pricing shows 100 dollars per 10k cals, so 1 cent per call. This means, for an agent that lets say does 4 actions, so and your examples show at least 3-4 api calls per action , it's already 12 cents? Similar tools like composio.dev have 200k calls for 29 dollars, so around 70x cheaper (both for the cheapest tier). Even with needing only 1 call for subsequent calls, 1 cent per single api call sounds wrong, at least for our use case it does not economic sense to pay 5-10 cents on top of llm costs on every user query. Apologies if I'm missing something!
2. Could this not be replicated by others by just handmaking a fuzzy search tool on the tools? I think this is the approach that will win, even with rag for lets say 10k plus tools maybe in the future, but not sure how much differentiation this is in the long term, i've made this search tool myself a couple of times already
1. Hi! We noticed this for Strata as well so we are ONLY counting the execute_action in Strata. This means the fee is the same as using the traditional, flat approaches. In other words, the 3 previous calls does not cost you anything!
2. I think the main drawback of search method is like giving a human lots of tools/APIs but you can ONLY access them via a search interface. This feels weird and should be improved. For our approaches, the step by step methods allow you to see what categories/actions are available. We also provide a documentation search so that you get the best out of both worlds.
1. Oh okay great, maybe clarify it in the pricing page? That mcp server call means just execute. But its' still 10x more expensive right?
2. From what i understand it's just nested search right? It is not anything different, if you do flat or embedding search or fuzzy/agentic nested is a choice for sure, but Im just saying not sure how defensible this is, if all other mcp competitors or even users themselves put in a nested search tool
1. Sure, we will clarify that in the pricing page, thank you! As you can see from our evaluation, we are much better than official MCP servers. I think people care more about getting things done correctly. Otherwise, you waste all the model tokens for nothing. We do have enterprise deals where we can work out a customized pricing plan for your specific use case. Come talk to us if you are interested.
2. One VERY important distinction is that the model is doing the "search" here, not some RAG algorithm or vector database. Therefore, as the model becomes smarter, this approach will be more accurate as well.
1. I was not talking about official MCP servers, those are often even free. Im talking about pricing of other devtools for aggregating tools/mcp's. I think this is an obvious space to build i agree, i just worry about differentiation. Its a search space not as big as web search, or complex (order doesnt matter).
2. Yes, i see, this is what i meant by agentic search. Essentially is a tiny subagent, taking list of tools in and out the relevant ones. Still implementable in 5 mins. But i guess if the experience is very smooth enterprise might pay?
1. Yes I agree. To be honest we are a young company (as you can tell we are from YC X25), so we are still figuring out the pricing. But thank you for the feedback and sharing your thoughts.
2. Yes the idea is not complex once you understand it. But there are some nuances we found along the way and supporting more integrations are always important but requires engineering efforts. Thank you!
How do you folks think about the Manus finding on dynamic tool selection? https://manus.im/blog/Context-Engineering-for-AI-Agents-Less...
> A natural reaction is to design a dynamic action space—perhaps loading tools on demand using something RAG-like. We tried that in Manus too. But our experiments suggest a clear rule: unless absolutely necessary, avoid dynamically adding or removing tools mid-iteration. There are two main reasons for this:
> 1. In most LLMs, tool definitions live near the front of the context after serialization, typically before or after the system prompt. So any change will invalidate the KV-cache for all subsequent actions and observations.
> 2. When previous actions and observations still refer to tools that are no longer defined in the current context, the model gets confused. Without constrained decoding, this often leads to schema violations or hallucinated actions.
> To solve this while still improving action selection, Manus uses a context-aware state machine to manage tool availability. Rather than removing tools, it masks the token logits during decoding to prevent (or enforce) the selection of certain actions based on the current context.
Their findings on KV-cache invalidation are spot on for a single-context approach.
Strata's architecture is philosophically different. Instead of loading a large toolset and masking it, we guide the LLM through a multi-step dialogue. Each step (e.g., choosing an app, then a category) is a separate, very small, and cheap LLM call.
So, we trade one massive prompt for a few tiny ones. This avoids the KV-cache issue because the context for each decision is minimal, and it prevents model confusion because the agent only ever sees the tools relevant to its current step. It's a different path to the same goal: making the agent smarter by not overwhelming it. Thanks for the great link!
The attack surface for agents with MCP access grows exponentially with the number of tools. On the scale of thousands of tools, I think it's nearly impossible to understand potential interactions and risk. Do you have any innovative controls in place that would help a CISO get comfortable with a product like this in an enterprise context?
That's the critical question. The key is that Strata never exposes all tools to the agent at once. Our progressive guidance acts as a dynamic allowlist, so that the agent only "sees" the specific tools relevant to its immediate task. This fundamentally reduces the blast radius at each step. We do provide a comprehensive audit trails for every action, giving a CISO a centralized control plane to manage and monitor agent capabilities, rather than an exponential risk. If you are interested, come talk to us!
How is a “dynamic allowlist” useful if it can still access anything based on what the user prompts? Is there a way to impose a static allowlist too?
Yes there is a way to impose a static allowlist. As a very simple example, you can disable certain servers completely via the UI or the API.
this. This will be the number one obstacle to adoption.
We kinda use https://github.com/googleapis/genai-toolbox but for databases looking forward if klavis provide more or general solution.
Ideally when we are writing agents we need mcp to support auth, custom headers because by design when deploying for saas we need to pass around client params to be able to isolate client connections.
We do token optimisation and other smart stuff to save token money. Looking forward to try this as well if this solves similar problems as well
Thank you! Yes we do provide auth and support other remote MCP servers via our API : https://docs.klavis.ai/api-reference/strata/create. It indeed support custom headers. Feel free to give us a try or come talk to us!
The fact people are giving credentials to all these MCP tools keeps amazing me.
Ten years ago if you built a service that asked you for permissions to everything imaginable most people would keep well clear. I guess the closest was Beeper which wanted your social passwords but that was heavily criticized and never very popular.
Now you slap an AI label on it and you can't keep people away.
MCP is like the "app store" for LLMs. LLMs can only do so much by themselves. They need connectivity to pull in context or take actions. Just like how your phone without apps is pretty limited in how useful it is.
Sure, teams could build their own connectors via function calling if they're running agents, but that only gets you so far. MCPs promise universal interoperability.
Some teams, like Block, are using MCP as a protocol but generally building their own servers.
But the vast majority are just sifting through the varying quality of published servers out there.
Those who are getting MCP to work are in the minority right now. Most just aren't doing it or aren't doing it well.
But there are plenty of companies racing into this space to make this work for enterprises / solve the problems you rightfully bring up.
As others have said here, the cat is out of the bag, and it is not going back in. MCP has enough buy-in from the community that it's likely to just get better vs. go away.
Source/Bias disclaimer: I pivoted my company to work on an MCP platform to smooth out those rough edges. We had been building integration technology for years. When a technology came along that promised "documentation + invocation" in-band over the protocol, I quickly saw that this could solve the pain of integration we had suffered for years. No more reading documentation and building integrations. The capability negotiation is built into the protocol.
Edit: a comma.
We also provide an open-source version for Strata so that you can have full control. You can self-host it on your own infrastructure, so your credentials never have to touch our servers.
That's nice, kudos. But trusting you is only half of the problem. I don't trust the LLM either.
Yeah I see what you mean. Many MCP clients has the ability to ask human for confirmation before a tool call is executed. In this way, you can check the tool call before it executes.
Is there any way for the LLM to bypass the request for human confirmation, or is it hard-coded into the deterministic MCP client code?
We do not build the MCP clients, but for many of the clients I believe it is hard-coded into the deterministic client code.
What do you propose they do? Because although something like Strata makes it easier, the reality is people are piling up MCP servers like they're free cupcakes. There's no getting the cat back in the box.
(I'm not in security so I genuinely don't know and am curious.)
We're keeping an unofficial allow list at work. Basically just major software companies only. Third party mcp servers at this point are basically just attack vectors. How do you even vet them continuously?
Honestly vetting MCP seems like a YC company in and of itself.
We build our MCP servers ourselves and many of them are open source. You can check out our github repo.
Can services dd to your supported list of MCP servers? Or do you write all the servers?
Strata supports connecting to custom external MCP servers via API: https://docs.klavis.ai/api-reference/strata/create. For the servers on our website, most of them are created by ourselves.
Looks really useful! Do you happen to have a gallery of apps using it? In particular, I'd like to see how desktop or mobile apps handle the oauth flows.
Yes we do! Just signup to our website https://www.klavis.ai/home/mcp-servers and you will see all the apps and the auth options.
I think having the gallery viewable without an account would be valuable.
Oh yes for sure. Checkout https://www.klavis.ai/mcp-servers. You do not need to sign up. But you cannot see the auth flow for obvious reasons.
The lack of enterprise Microsoft services is really discouraging.
We are aware of this and working on it now. We are actually code complete on Microsoft Teams and Outlook. We will definitely launch it within the next week or so.
This is something I actually started to work on and put down because it wasn't exciting enough, but it's a legit product that fills a niche and congrats on the launch.
The biggest issue I found was getting agents to intelligently navigate the choose your own adventure of searching for the right tool. It amazes me that they're so good at coding when they're so bad at tool use in general. I'm sure your MCP responses were a fun bit of prompt engineering.
Haha yeah we did optimize it a lot before the launch!
Actually for us, our first prototype was pretty good! We are also surprised about that because it took us a day or so to build the the prototype (only for one integrations though). Then it took us another week to build another prototype for multiple integrations.
Is the goal to make a “universal MCP” that makes it easy to let MCP clients execute thousands of tools on a session by session basis? Or is it more focused on initial tool discovery and registration? If it’s the former, does the process add more latency between user taking action and tool getting executed?
Yes it is the former. The value comes from its progressive guidance during a task, not just in the initial setup.
As for latency, we optimized for that. For examples, Strata automatically uses a direct, flat approach for simple cases. And we use less tokens compared to official MCP servers as well, as shown in the benchmark.
Isn't Microsoft going to eat your lunch with their "virtual tools" offering for high tool counts?
We tested our approaches with several thousand tools and it is working pretty well. Also we provide API access as well, so any developer can use this, not just on Microsoft or VS Code.
Heads up, your docs for “Getting Started > Multi-app integration > open source” point to a broken link for the open source code:
https://docs.klavis.ai/documentation/quickstart#open-source
Add a call to the mintlify cli ‘mint broken-links’ into your CI and you should be set!
Thank you! I did not know this trick! I am fixing this now.
This looks very relevant and useful to what I'm working on at the moment. The LLM gets lost in all of the tools we provide for certain actions.
Glad it could be helpful to you! Curious what AI agents you are building and exactly what tools caused the failure.
In any case, feel free to give us a try!
How do you handle compliance questionnaires from companies that adhere to SOC2 guidelines? If I used Klavis how would I tell my clients which information I send to which external partners?
We are SOC2 compliant. So this will not be a problem. Come talk to us if you are interested in using this for your clients and we can work out the details.
As an investor, I’m hesistant to invest in mcp infra startups
Nice work this definitely feels like a market gap, for those who’ve been deep enough to experience it.
Thank you!
How does this differ from something like nexusrouter?
As you can see from our examples, the main approach is not tool search. Instead, Strata guides your AI agent step by step, going from server to categories to actions to action details so that the model does not get overloaded. We actually have 1000+ tools for some of the integrations (e.g. GitHub) and this approach works better than traditional methods.
Think of it as a search engine vs. a file explorer. But we do provide documentation search as well. So you get the best out of the two worlds.
Comparing to the official GitHub server isn't a good benchmark because it's a bloated mess of tools still.
The eval benchmark also compared Notion and we are 13% higher than them as well!