←back to thread

100 points lmeierhoefer | 8 comments | | HN request time: 0.443s | source | bottom

Hi HN, we’re the cofounders of Augento (https://augento.ai/). We’re building Deepseek R1-like fine-tuning as a service. You connect your agent, tell us when it’s right or wrong, and we deliver an LLM optimized for that agent. There’s a demo video https://www.youtube.com/watch?v=j5RQaTdRrKE, and our docs are at https://docs.augento.ai/. It’s open for anyone to use at https://augento.ai.

Agents fail all the time, especially when you try to use them for something actually useful. Current solution approaches suck: prompting has intrinsic limits and supervised fine-tuning requires big explicit datasets that are hard to collect.

Two months ago, the DeepSeek R1 paper outlined a way to post-train LLMs with (almost) pure reinforcement learning. We took up their research and built a fine-tuning platform around that.

You let us intercept your agent's data flow, and we deliver you a fine-tuned open-source model, that is trained on the agent's specific task. Instead of providing big datasets of explicit fine-tuning samples, you provide a reward function, judging the model's outputs.

Here are examples of what this can be used for:

Coding Agent: We fine-tuned a coding agent that was constantly making syntax errors and failed to handle semantic edge cases properly. By providing a reward function that evaluated code against the compiler, the agent learned not to produce these errors. The fine-tuned model reduced critical bugs by 40% with just 20 training samples.

MCP Tool Specialization: Imagine you have a custom set of internal tools using the MCP protocol, but your agent keeps selecting the wrong tool or passing incompatible parameters. You could fine-tune with a reward function that scores tool selection and parameter matching.

Browser Agent Navigation: If you're building a browser agent that struggles with complex web UIs or specific sites, you could fine-tune it to better understand UI elements and navigation patterns. With a reward function that scores successful task completion (like "find the best price for this product" or "complete this multi-step form"), you could train an agent that better identifies clickable elements, understands form validation errors, and navigates through complex SPAs without getting stuck.

VLA Robot Control: If you're using vision-language models to control robotic arms or other hardware, you could fine-tune for your specific actuator setup. With a reward function based on high-level task completion, you could train a Vision-Langauge-Action (VLA) model that translates natural language commands like "move the red block behind the blue cylinder" into actuator controls for your specific hardware.

As you see from these examples, the current paradigm is best suited for "verifiable domains”, where it is possible to give an explicit function judging the model’s outputs. However, up next, we will also support an "alignment mode", where you don't have to provide a reward function but provide high-level feedback on past failure runs of your agent. Just tag where things went wrong, and we'll handle the rest. This makes it even easier to improve your agents without needing to write formal reward functions.

Our platform is not itself open source, but it fine-tunes open-source language models. I.e. it is an alternative to the reinforcement fine-tuning API from OpenAI, but with Qwen, LLama, Deepseek, etc., and more customizability on the reward model. We charge users for the training and for their inference/interaction with the model later on ($0 monthly flat fee + training cost + inference cost).

The platform is self-serving and open to use at https://augento.ai/dashboard. We’ll give you $20 in training credits, which should be enough for connecting your agent and delivering some observable improvement on your use case.

We’d love to hear your thoughts and feedback!

1. codingwagie ◴[] No.43538135[source]
This is just dev ops wrapped around an open source fine tuning repo.
replies(2): >>43538223 #>>43538594 #
2. hannesfur ◴[] No.43538223[source]
In a sense: You are not wrong! But when we got started we thought it is way easier than it actually was. Procuring powerful GPUs alone is difficult, collecting proper data too. But of course you can still do everything yourself. If you want to give this a try yourself, I would recommend taking a look at torchtune (https://github.com/pytorch/torchtune).
replies(2): >>43538519 #>>43538862 #
3. codingwagie ◴[] No.43538519[source]
Should I just put a ui on top of:

https://aws.amazon.com/blogs/machine-learning/customize-deep...

And charge for it?

replies(1): >>43539255 #
4. qeternity ◴[] No.43538594[source]
It's convenience. And people pay for convenience all the time.
replies(1): >>43543694 #
5. noosphr ◴[] No.43538862[source]
People not in the field have no idea just how distorted the market is right now.

I was working at a startup doing end to end training for modified BERT architectures and everything from buying a GPU - basically impossible right now, we ended up looking at sourcing franken cards _from_ China.

To the power and heat removal - you need a large factories worth of power in the space of a small flat.

To pre-training something that's not been pre-trained before - say hello to throwing out more than 80% of pretraining runs because of a novel architecture.

Was designed to burn money as fast as possible.

Without hugely deep pockets, with a contract from NVidia, and with a datacenter right next to a nuclear power plant you can't compete at the model level.

replies(1): >>43539410 #
6. hannesfur ◴[] No.43539255{3}[source]
Yes, you could do that. However, you would have created a different platform than Augento. Maybe we should make the distinction clearer though.

The blog article you are referring to uses another method to fine-tune models that many other big platforms like Together AI (and even OpenAI themselves) are already supporting: Supervised Fine Tuning (SFT). We are doing Reinforcement Learning using GRPO instead. SFT has the big caveat that it requires good prompt-completion datasets to work, which are rare/hard to curate for many use cases. For GRPO, you (the programmer) don’t even need to know what the correct answer is as long as you can decide if it’s a good answer (P?NP) at its heart, essentially.

7. hannesfur ◴[] No.43539410{3}[source]
You are right. If you want/can pay out of your own pocket, RunPod (https://www.runpod.io) deserves a shoutout here. We rented GPUs from them (they have them and they are cheaper and more available than Lambda Labs) until we convinced AWS to give us capacity blocks. But in general the prices for GPUs as well as their scarcity is really crass and unlike mining you can't really use gaming or franken cards as a fallback. I can count the GPUs we can do this on (even for relatively small models) on one hand.
8. lukasego ◴[] No.43543694[source]
People pay for convenience, that's true - and part of the equation here. Agreed! The approach is to make data capturing as convenient as possible, where you just paste in api key + base url into your existing code, and you gather all your runs. And then, Reinforcement Learning is hard to figure out - so one of the goals is to commoditize Reinforcement Learning, what you're alluding to. In its iteration, the platform is released with verifiable mode where Augento takes all the headache of GPU infrastructure, GRPO implementation, training configurations and dataset curation away - you just select your gathered runs, and start the training. But we'll go past that, and expand Augento into a platform for alignment and self-learning. Tl;DR Yes, indeed! We designed Augento with convenience in mind.