←back to thread

161 points segmenta | 1 comments | | HN request time: 0s | source

Hi HN! We’re Arjun, Ramnique, and Akhilesh, and we are building Rowboat (https://www.rowboatlabs.com/), an AI-assisted IDE for building and managing multi-agent systems. You start with a single agent, then scale up to teams of agents that work together, use MCP tools, and improve over time - all through a chat-based copilot.

Our repo is https://github.com/rowboatlabs/rowboat, docs are at https://docs.rowboatlabs.com/, and there’s a demo video here: https://youtu.be/YRTCw9UHRbU

It’s becoming clear that real-world agentic systems work best when multiple agents collaborate, rather than having one agent attempt to do everything. This isn’t too surprising - it’s a bit like how good code consists of multiple functions that each do one thing, rather than cramming everything into one function.

For example, a travel assistant works best when different agents handle specialized tasks: one agent finds the best flights, another optimizes hotel selections, and a third organizes the itinerary. This modular approach makes the system easier to manage, debug, and improve over time.

OpenAI’s Agents SDK provides a neat Python library to support this, but building reliable agentic systems requires constant iterations and tweaking - e.g. updating agent instructions (which can quickly get as complex as actual code), connecting tools, and testing the system and incorporating feedback. Rowboat is an AI IDE to do all this. Rowboat is to AI agents what Cursor is to code.

We’ve taken a code-like approach to agent instructions (prompts). There are special keywords to directly reference other agents, tools or prompts - which are highlighted in the UI. The copilot is the best way to create and edit these instructions - each change comes with a code-style diff.

You can give agents access to tools by integrating any MCP server or connecting your own functions through a webhook. You can instruct the agents on when to use specific tools via ‘@mentions’ in the agent instruction. To enable quick testing, we added a way to mock tool responses using LLM calls.

Rowboat playground lets you test and debug the assistants as you build them. You can see agent transfers, tool invocations and tool responses in real-time. The copilot has the context of the chat, and can improve the agent instructions based on feedback. For example, you could say ‘The agent shouldn’t have done x here. Fix this’ and the copilot can go and make this fix.

You can integrate agentic systems built in Rowboat into your application via the HTTP API or the Python SDK (‘pip install rowboat’). For example, you can build user-facing chatbots, enterprise workflows and employee assistants using Rowboat.

We’ve been working with LLMs since GPT-1 launched in 2018. Most recently, we built Coinbase’s support chatbot after our last AI startup was acquired by them.

Rowboat is Apache 2.0 licensed, giving you full freedom to self-host, modify, or extend it however you like.

We’re excited to share Rowboat with everyone here. We’d love to hear your thoughts!

Show context
simonw ◴[] No.43767628[source]
"It’s becoming clear that real-world agentic systems work best when multiple agents collaborate, rather than having one agent attempt to do everything."

I'll be honest: I don't buy that premise (yet). It's clearly a popular idea and I see a lot of excitement about it (see Google's A2A thing) but it feels to me like a pattern that, in many cases, will make LLM code even harder to get reliable results from.

I worry it's the AI-equivalent of microservices: useful in a small set of hyper complex systems, the vast majority of applications that adopt it would have been better off without.

If there are strong arguments counter to what I've said here I'd love to hear them!

replies(6): >>43767888 #>>43767916 #>>43768061 #>>43768179 #>>43771568 #>>43771736 #
danenania ◴[] No.43768061[source]
A few concrete examples of multi-agent collaboration being useful in my project Plandex[1]:

- While it uses Sonnet 3.7 by default for creating the edit snippet when writing code, calls related to applying the snippet and validating the result (and falling back to a whole file write if needed) use o3-mini (soon to be o4-mini) which is 1/3 the cost, much faster, and actually more accurate and reliable than Sonnet for this particular narrow task.

- If Sonnet 3.7's context limit is exceeded in the planning stages, it can switch to a Gemini model for planning, then go back to Sonnet again for the implementation steps (since these only need the files relevant to each step).

- It eagerly summarizes the conversation after each response so that the summary can be used later if the conversation gets too long. This is only practical because much smaller models than the main planning/coding models are sufficient for a good summary. Otherwise it would be way too expensive.

It's definitely more complex, but I think in these cases at least, there's a real payoff for the trouble.

1 - https://github.com/plandex-ai/plandex

replies(1): >>43771517 #
rchaves ◴[] No.43771517[source]
is this multi-agent collaboration though, or is it just a workflow? All examples you listed seem to have pretty deterministic control flows (write then validade, context exceeded, after each response, etc)

when I think of multi-agent collaboration I think of also the control flow and handover to be defined by the agents themselves, this is the thing I have yet to see examples of in production, and the premise that I also don't buy yet

replies(1): >>43798687 #
1. danenania ◴[] No.43798687[source]
You’re right that it’s a fuzzy line. That said, if you can make the contract/handoff between agents deterministic, you’ll always get better results by doing that, compared to letting the agents try to handle it through inference, since there will always be some error rate.

For this reason, I think that for at least the next couple years, even very advanced agent systems are likely to have a lot of deterministic control flow and glue in their guts. To me, that doesn’t make them “not multi-agent”. Rather, this is how you can build multi-agent systems that actually work in reality. But much of it comes down to semantics, admittedly.