As a former Senior SWE on Google Gemini 's tool use team, I saw firsthand how AI would struggle with tools. If you've built AI agents, you've likely hit the same walls: (1) AI agents struggle to pick the right API from hundreds of options. (2) Tool descriptions and info consume massive token budgets. (3) Most servers cap at 40~50 tools to avoid these problems, limiting what you can build.
Instead of flooding the AI with everything upfront, Strata works like a human would. It guides the AI agents to discover relevant categories, then lists available actions in those categories. It relies on LLMs’ reasoning to drill down progressively to find the exact tool needed. Here are some examples:
Github query: "Find my stale pull requests in our main repo"
Strata: AI model identifies GitHub → Shows categories (Repos, Issues, PRs, Actions) → AI selects PRs → Shows PR-specific actions -> AI selects list_pull_requests → Shows list_pull_requests details -> Executes list_pull_requests with the right parameters.
Jira query: "Create a bug ticket in the 'MOBILE' project about the app crashing on startup."
Strata: AI identifies Jira → Shows categories (Projects, Issues, Sprints) → AI selects Issues → Shows actions (create_issue, get_issue) → AI selects create_issue → Shows create_issue details → Executes with correct parameters.
Slack query: "Post a message in the #announcements channel that bonus will be paid out next Friday."
Strata: AI identifies Slack → Shows categories (Channels, Messages, Users) → AI selects Messages → Shows actions (send_message, schedule_message) → AI selects send_message → Shows send_message details → Executes with correct parameters.
This progressive approach unlocks a huge advantage: depth. While most integrations offer a handful of high-level tools, Strata can expose hundreds of granular features for a single app like GitHub, Jira, etc. Your AI agent can finally access the deep, specific features that real workflows require, without getting lost in a sea of options.
Under the hood, Strata manages authentication tokens and includes a built-in search tool for the agent to dig into documentation if it gets stuck.
On the MCPMark https://mcpmark.ai/leaderboard/mcp, Strata achieves +15.2% higher pass@1 rate vs the official GitHub server and +13.4% higher pass@1 rate vs the official Notion server. In human eval tests, it hits 83%+ accuracy on complex, real-world multi-app workflows.
Here is a quick demo to watch Strata navigate a complex workflow with multiple apps, automatically selecting the right tools at each step: https://www.youtube.com/watch?v=N00cY9Ov_fM.
You can connect to any external MCP Server into Strata, and we have an open source version for it: https://github.com/Klavis-AI/klavis.
For team or production use with more features, visit our website: https://www.klavis.ai. Add Strata to Cursor, VS Code or any MCP-compatible application with one click. You can also use our API to easily plug in Strata to your AI application.
We look forward to your comments. Thanks for reading!
Think of it as a search engine vs. a file explorer. But we do provide documentation search as well. So you get the best out of the two worlds.
The biggest issue I found was getting agents to intelligently navigate the choose your own adventure of searching for the right tool. It amazes me that they're so good at coding when they're so bad at tool use in general. I'm sure your MCP responses were a fun bit of prompt engineering.
Actually for us, our first prototype was pretty good! We are also surprised about that because it took us a day or so to build the the prototype (only for one integrations though). Then it took us another week to build another prototype for multiple integrations.
As for latency, we optimized for that. For examples, Strata automatically uses a direct, flat approach for simple cases. And we use less tokens compared to official MCP servers as well, as shown in the benchmark.
Ideally when we are writing agents we need mcp to support auth, custom headers because by design when deploying for saas we need to pass around client params to be able to isolate client connections.
We do token optimisation and other smart stuff to save token money. Looking forward to try this as well if this solves similar problems as well
2. Could this not be replicated by others by just handmaking a fuzzy search tool on the tools? I think this is the approach that will win, even with rag for lets say 10k plus tools maybe in the future, but not sure how much differentiation this is in the long term, i've made this search tool myself a couple of times already
2. I think the main drawback of search method is like giving a human lots of tools/APIs but you can ONLY access them via a search interface. This feels weird and should be improved. For our approaches, the step by step methods allow you to see what categories/actions are available. We also provide a documentation search so that you get the best out of both worlds.
Ten years ago if you built a service that asked you for permissions to everything imaginable most people would keep well clear. I guess the closest was Beeper which wanted your social passwords but that was heavily criticized and never very popular.
Now you slap an AI label on it and you can't keep people away.
2. From what i understand it's just nested search right? It is not anything different, if you do flat or embedding search or fuzzy/agentic nested is a choice for sure, but Im just saying not sure how defensible this is, if all other mcp competitors or even users themselves put in a nested search tool
2. One VERY important distinction is that the model is doing the "search" here, not some RAG algorithm or vector database. Therefore, as the model becomes smarter, this approach will be more accurate as well.
2. Yes, i see, this is what i meant by agentic search. Essentially is a tiny subagent, taking list of tools in and out the relevant ones. Still implementable in 5 mins. But i guess if the experience is very smooth enterprise might pay?
https://docs.klavis.ai/documentation/quickstart#open-source
Add a call to the mintlify cli ‘mint broken-links’ into your CI and you should be set!
2. Yes the idea is not complex once you understand it. But there are some nuances we found along the way and supporting more integrations are always important but requires engineering efforts. Thank you!
(I'm not in security so I genuinely don't know and am curious.)
> A natural reaction is to design a dynamic action space—perhaps loading tools on demand using something RAG-like. We tried that in Manus too. But our experiments suggest a clear rule: unless absolutely necessary, avoid dynamically adding or removing tools mid-iteration. There are two main reasons for this:
> 1. In most LLMs, tool definitions live near the front of the context after serialization, typically before or after the system prompt. So any change will invalidate the KV-cache for all subsequent actions and observations.
> 2. When previous actions and observations still refer to tools that are no longer defined in the current context, the model gets confused. Without constrained decoding, this often leads to schema violations or hallucinated actions.
> To solve this while still improving action selection, Manus uses a context-aware state machine to manage tool availability. Rather than removing tools, it masks the token logits during decoding to prevent (or enforce) the selection of certain actions based on the current context.
Strata's architecture is philosophically different. Instead of loading a large toolset and masking it, we guide the LLM through a multi-step dialogue. Each step (e.g., choosing an app, then a category) is a separate, very small, and cheap LLM call.
So, we trade one massive prompt for a few tiny ones. This avoids the KV-cache issue because the context for each decision is minimal, and it prevents model confusion because the agent only ever sees the tools relevant to its current step. It's a different path to the same goal: making the agent smarter by not overwhelming it. Thanks for the great link!
Honestly vetting MCP seems like a YC company in and of itself.
Sure, teams could build their own connectors via function calling if they're running agents, but that only gets you so far. MCPs promise universal interoperability.
Some teams, like Block, are using MCP as a protocol but generally building their own servers.
But the vast majority are just sifting through the varying quality of published servers out there.
Those who are getting MCP to work are in the minority right now. Most just aren't doing it or aren't doing it well.
But there are plenty of companies racing into this space to make this work for enterprises / solve the problems you rightfully bring up.
As others have said here, the cat is out of the bag, and it is not going back in. MCP has enough buy-in from the community that it's likely to just get better vs. go away.
Source/Bias disclaimer: I pivoted my company to work on an MCP platform to smooth out those rough edges. We had been building integration technology for years. When a technology came along that promised "documentation + invocation" in-band over the protocol, I quickly saw that this could solve the pain of integration we had suffered for years. No more reading documentation and building integrations. The capability negotiation is built into the protocol.
Edit: a comma.
I've sent a repost invite for the first submission (https://news.ycombinator.com/item?id=44608593) - hopefully it will get some discussion here.
What IS useful and offers value is having an agent which accesses 1 or 2 tools but always uses those tools accurately and correctly 99.9+% of the time.
I've seen tons of MCP companies who are offering 1000s of wrapped HTTP APIs as MCP tools, which is note very easy to implement, but in reality, it's totally fucking useless for enterprise use-cases which need to work reliably, in a secure, repeatable fashion.
Any chump can rig an MCP client to 20 tools, but then watch your agent fail again and again and again.
Basically, this is a bad idea for a business and I'd personally suggest pivoting to something that focuses on ensuring a single agent works reliably, provides guardrails, evaluation, security etc. This is the real challenge to solve.
in your example, one agent access 1~2 tools, what if an enterprise have 1000 use cases and need 1000 AI agents as you said. How you let your system intelligently dispatch the correct AI agent to the correct use case? (or will OpenAI create 1000 customGPT and let user to choose which one? ) instead you need a way to intelligently dispatch user query to different sub-tools or sub-agent, depends on how you design. But the essence of Strata I believe is rely on model's reasoning to minimize context window + maximizer relevant info. and then solve the user query.
Security is another level of topic and I think it's not related to this approach.
The idea of 20000 MCP tools sounds good on paper, but the reality is that the enterprise wants an agent that just connects to Jira and ServiceNow and actually works. They don't care about 1000 randomly shitty MCP servers built by random people online which they can't trust.