←back to thread

176 points lennertjansen | 1 comments | | HN request time: 0s | source

Hey HN, we're Lennert and Rauf. We’re building Airweave (https://github.com/airweave-ai/airweave), an open-source tool that lets agents search and retrieve data from any app or database. Here’s a general intro: https://www.youtube.com/watch?v=EFI-7SYGQ48, and here’s a longer one that shows more real-world use cases, examples of how Airweave is used by Cursor (0:33) and Claude desktop (2:04), etc.: https://youtu.be/p2dl-39HwQo

A couple of months ago we were building agents that interacted with different apps and were frustrated when they struggled to handle vague natural language requests like "resolve that one Linear issue about missing auth configs", "if you get an email from an unsatisfied customer, reimburse their payment in Stripe", or "what were the returns for Q1 based on the financials sheet in gdrive?", only to have the agent inefficiently chain together loads of function calls to find the data or not find it at all and hallucinate.

We also noticed that despite the rise of MCP creating more desire for agents to interact with external resources, the majority of agent dev tooling focused on function calling and actions instead of search. We were annoyed by the lack of tooling that enabled agents to semantically search workspace or database contents, so we started building Airweave first as an internal solution. Then we decided to open-source it and pursue it full time after we got positive reactions from coworkers and other agent builders.

Airweave connects to productivity tools, databases, or document stores via their APIs and transforms their contents into searchable knowledge bases, accessible through a standardized interface for the agent. The search interface is exposed via REST or MCP. When using MCP, Airweave essentially builds a semantically searchable MCP server on top of the resource. The platform handles the entire data pipeline from connection and extraction to chunking, embedding, and serving. To ensure knowledge is current, it has automated sync capabilities, with configurable schedules and change detection through content hashing.

We built it with support for white-labeled multi-tenancy to provide OAuth2-based integration across multiple user accounts while maintaining privacy and security boundaries. We're also actively working on permission-awareness (i.e., RBAC on the data) for the platform.

So happy to share learnings and get insights from your experiences. looking forward to comments!

Show context
valianter ◴[] No.43965140[source]
Is chat always the best interface for all of these apps? I feel like search is the natural first step, but chat-based search has been around for a while. Feel like an MCP-based version of Glean/Onyx/Moveworks/Dashworks is interesting, but unsure how much better it makes the product. Curious to see why your product is better
replies(1): >>43965350 #
1. raufakdemir ◴[] No.43965350[source]
Co-founder here. The Airweave interface doesn't discriminate which downstream use case it's applied in. Most current developers don't build it for a chat interface at all actually. Instead they fold it into their agents to give them access to user data. At first sight enterprise search looks quite similar, but instead this is a building block for developers to set up integrations for their internal agent / agent product.