←back to thread

65 points nickpapciak | 2 comments | | HN request time: 0s | source

Hey HN! We’re Abhi, Venkat, Tom, and Nick and we are building Datafruit (https://datafruit.dev/), an AI DevOps agent. We’re like Devin for DevOps. You can ask Datafruit to check your cloud spend, look for loose security policies, make changes to your IaC, and it can reason across your deployment standards, design docs, and DevOps practices.

Demo video: https://www.youtube.com/watch?v=2FitSggI7tg.

Right now, we have two main methods to interact with Datafruit:

(1) automated infrastructure audits— agents periodically scan your environment to find cost optimization opportunities, detect infrastructure drift, and validate your infra against compliance requirements.

(2) chat interface (available as a web UI and through slack) — ask the agent questions for real-time insights, or assign tasks directly, such as investigating spend anomalies, reviewing security posture, or applying changes to IaC resources.

Working at FAANG and various high-growth startups, we realized that infra work requires an enormous amount of context, often more than traditional software engineering. The business decisions, codebase, and cloud itself are all extremely important in any task that has been assigned. To maximize the success of the agents, we do a fair amount of context engineering. Not hallucinating is super important!

One thing which has worked incredibly well for us is a multi-agent system where we have specialized sub-agents with access to specific tool calls and documentation for their specialty. Agents choose to “handoff” to each other when they feel like another agent would be more specialized for the task. However, all agents share the same context (https://cognition.ai/blog/dont-build-multi-agents). We’re pretty happy with this approach, and believe it could work in other disciplines which require high amounts of specialized expertise.

Infrastructure is probably the most mission-critical part of any software organization, and needs extremely heavy guardrails to keep it safe. Language models are not yet at the point where they can be trusted to make changes (we’ve talked to a couple of startups where the Claude Code + AWS CLI combo has taken their infra down). Right now, Datafruit receives read-only access to your infrastructure and can only make changes through pull requests to your IaC repositories. The agent also operates in a sandboxed virtual environment so that it could not write cloud CLI commands if it wanted to!

Where LLMs can add significant value is in reducing the constant operational inefficiencies that eat up cloud spend and delay deadlines—the small-but-urgent ops work. Once Datafruit indexes your environment, you can ask it to do things like:

  "Grant @User write access to analytics S3 bucket for 24 hours"
    -> Creates temporary IAM role, sends least-privilege credentials, auto-revokes tomorrow

  "Find where this secret is used so I can rotate it without downtime"
    -> Discovers all instances of your secret, including old cron-jobs you might not know about, so you can safely rotate your keys


  "Why did database costs spike yesterday?"
    -> Identifies expensive queries, shows optimization options, implements fixes

We charge a straightforward subscription model for a managed version, but we also offer a bring-your-own-cloud model. All of Datafruit can be deployed on Kubernetes using Helm charts for enterprise customers where data can’t leave your VPC. For the time being, we’re installing the product ourselves on customers' clouds. It doesn’t exist in a self-serve form yet. We’ll get there eventually, but in the meantime if you’re interested we’d love for you guys to email us at founders@datafruit.dev.

We would love to hear your thoughts! If you work with cloud infra, we are especially interested in learning about what kinds of work you do which you wish could be offloaded onto an agent.

Show context
cddotdotslash ◴[] No.45105695[source]
I can see the value, but to do the things you're describing, the AI needs to be given fairly highly-privileged credentials.

> Right now, Datafruit receives read-only access to your infrastructure

> "Grant @User write access to analytics S3 bucket for 24 hours" > -> Creates temporary IAM role, sends least-privilege credentials, auto-revokes tomorrow

These statements directly conflict with one another.

So it needs "iam:CreateRole," "iam:AttachPolicy," and other similar permissions. Those are not "read-only." And, they make it effectively admin in the account.

What safeguards are in place to make sure it doesn't delete other roles, or make production-impacting changes?

replies(1): >>45105912 #
nickpapciak ◴[] No.45105912[source]
Ahh. To clarify, changes like granting users access would be done by our agent modifying IaC, so you would still have to manually apply the changes. Every potentially destructive change being an IaC change helps allow the humans to always stay in the loop. This admittedly makes the agents a little more annoying to work with, but safer.
replies(1): >>45108530 #
Kwpolska ◴[] No.45108530[source]
So you’re modifying Terraform? How is your tool better than just using an AI-enabled IDE and asking it to apply the change?

How is the auto-revoke handled? Will it require human intervention to merge a PR/apply the Terraform configuration, or will it do it automatically?

replies(1): >>45110002 #
1. nickpapciak ◴[] No.45110002{3}[source]
Lots of people have asked us this! We try to do more than just being an AI-enabled IDE by giving the agent access to your infrastructure and observability tools. So you can query over your AWS, get information about metrics over the past few days, etc etc. We also plan to integrate with more DevOps tools as our customers ask for them. We also try to be less like an IDE, and more like an autonomous agent. We've noticed that DevOps engineers actually like being engineers, and enjoy some infrastructure tasks, while there are others that they would rather automate away. Not sure if you have experienced this sentiment?

Also, auto-revoke right now can be handled by creating a role in Terraform that can be assumed and expires after a certain time. But we’re exploring deeper integrations with identity providers like Okta to handle this better.

replies(1): >>45113006 #
2. OliverGuy ◴[] No.45113006[source]
I can put some AWS Creds in my terminal and Claude Code is perfectly happy writing AWS CLI commands (or whole python scripts if necessary) to work out what it needs to about my infrastructure.