←back to thread

161 points segmenta | 2 comments | | HN request time: 0s | source

Hi HN! We’re Arjun, Ramnique, and Akhilesh, and we are building Rowboat (https://www.rowboatlabs.com/), an AI-assisted IDE for building and managing multi-agent systems. You start with a single agent, then scale up to teams of agents that work together, use MCP tools, and improve over time - all through a chat-based copilot.

Our repo is https://github.com/rowboatlabs/rowboat, docs are at https://docs.rowboatlabs.com/, and there’s a demo video here: https://youtu.be/YRTCw9UHRbU

It’s becoming clear that real-world agentic systems work best when multiple agents collaborate, rather than having one agent attempt to do everything. This isn’t too surprising - it’s a bit like how good code consists of multiple functions that each do one thing, rather than cramming everything into one function.

For example, a travel assistant works best when different agents handle specialized tasks: one agent finds the best flights, another optimizes hotel selections, and a third organizes the itinerary. This modular approach makes the system easier to manage, debug, and improve over time.

OpenAI’s Agents SDK provides a neat Python library to support this, but building reliable agentic systems requires constant iterations and tweaking - e.g. updating agent instructions (which can quickly get as complex as actual code), connecting tools, and testing the system and incorporating feedback. Rowboat is an AI IDE to do all this. Rowboat is to AI agents what Cursor is to code.

We’ve taken a code-like approach to agent instructions (prompts). There are special keywords to directly reference other agents, tools or prompts - which are highlighted in the UI. The copilot is the best way to create and edit these instructions - each change comes with a code-style diff.

You can give agents access to tools by integrating any MCP server or connecting your own functions through a webhook. You can instruct the agents on when to use specific tools via ‘@mentions’ in the agent instruction. To enable quick testing, we added a way to mock tool responses using LLM calls.

Rowboat playground lets you test and debug the assistants as you build them. You can see agent transfers, tool invocations and tool responses in real-time. The copilot has the context of the chat, and can improve the agent instructions based on feedback. For example, you could say ‘The agent shouldn’t have done x here. Fix this’ and the copilot can go and make this fix.

You can integrate agentic systems built in Rowboat into your application via the HTTP API or the Python SDK (‘pip install rowboat’). For example, you can build user-facing chatbots, enterprise workflows and employee assistants using Rowboat.

We’ve been working with LLMs since GPT-1 launched in 2018. Most recently, we built Coinbase’s support chatbot after our last AI startup was acquired by them.

Rowboat is Apache 2.0 licensed, giving you full freedom to self-host, modify, or extend it however you like.

We’re excited to share Rowboat with everyone here. We’d love to hear your thoughts!

Show context
victorbjorklund ◴[] No.43769404[source]
"It’s becoming clear that real-world agentic systems work best when multiple agents collaborate, rather than having one agent attempt to do everything."

In a recent episode of Practical AI with the people behind All Hands:

"...when the Open Hands project started out, we were kind of on this bandwagon of trying to create a big agentic framework that you could use with and define lots of different agents. You could have your debugging agent, you could have your software architect agent, you could have your browsing agent and all of these things like this. And we actually implemented a framework where you could have one agent delegate to another agent and then that agent would go off and do this task and things like this.

One somewhat surprising thing is how ineffective this paradigm ended up being from two perspectives. So the first perspective is it didn't really and this is specifically for the case of software engineering. There might be other cases where this would be useful. But the first is in terms of effectiveness, we found that having a single agent that just has all of the necessary context, it has the ability to write code, use a web browser to gather information and execute code. Ends up being able to do a pretty large swath of tasks without a lot of specific tooling and structuring around the problems."

https://practicalai.fm/310

Not saying it is wrong. But I don't think it is something that is "clear" and we can take for granted so some benchmarks/reasoning why would have been great.

replies(2): >>43769411 #>>43769879 #
1. segmenta ◴[] No.43769879[source]
Thanks for the pointer. We do agree that not all agentic systems should be multi-agent.

Having said that, from our experience we see that for complex workflows e.g. customer support for enterprises, both quality and maintainability stands to gain when the system is decomposed into smaller scoped agents. We see a parallel of this in humans as well. For instance, when we call into customer support we get routed to different human agents based on our query.

OpenAI says something similar in their recent guide on building agents [0]: "For many complex workflows, splitting up prompts and tools across multiple agents allows for improved performance and scalability. When your agents fail to follow complicated instructions or consistently select incorrect tools, you may need to further divide your system and introduce more distinct agents."

A relevant benchmark here might be the Instruction Following benchmark: https://scale.com/leaderboard/multichallenge. Even the best reasoning models top out at ~60% accuracy on this.

The options to improve accuracy then, are (a) either fine-tune a model on this task specific dataset, (b) or decompose the problem into smaller sub-problems (divide-and-conquer) - this is more practical and maintainable.

[0] https://cdn.openai.com/business-guides-and-resources/a-pract...

replies(1): >>43769967 #
2. victorbjorklund ◴[] No.43769967[source]
Make sense!