←back to thread

120 points gbxk | 3 comments | | HN request time: 0.724s | source

I've built this to make it easy to host your own infra for lightweight VMs at large scale.

Intended for exec of AI-generated code, for CICD runners, or for off-chain AI DApps. Mainly to avoid Docker-in-Docker dangers and mess.

Super easy to use with CLI / Python SDK, friendly to AI engs who usually don't like to mess with VM orchestration and networking too much.

Defense-in-depth philosophy.

Would love to get feedback (and contributors: clear & exciting roadmap!), thx

Show context
alexeldeib ◴[] No.45658355[source]
as someone in the space this ticks a lot of boxes: kubernetes-native, strong isolation, python sdk (ideal for ML scenarios). devmapper is a nice ootb approach.

Glancing at the readme, is your business model technical support? Or what's your plan with this?

Anything interesting to share around startup time for large artifacts, scaling, passing through persistent storage (or GPUs) to these sandboxes?

Curious what things like 'Multi-node cluster capabilities for distributed workloads' mean exactly? inter-VM networking?

replies(1): >>45658623 #
1. gbxk ◴[] No.45658623[source]
No business model short-term. My goal is broad adoption, 100% open-source.

By multi-node I mean so far I only support 1 k8s node, i.e. 1 machine, but soon adding support for multiple. Still, on 20 CPUs I can run +50 VM pods with fractional vCPU limits.

For GPU passthrough: not possible today because I use Firecracker as VMM. On roadmap: Add support for Qemu, then GPU passthrough possible.

Inter-VM networking: it's already possible on single-node: 1 VM = 1 pod. Can have multiple pods per node (have a look at utils/stress-test.sh). Right now I default deny-all ingress for safety (because by default k8s allows inter pod communication), but can make ingress configurable.

Startup time: a second, or a few seconds, depending on which base image (alpine, ubuntu, etc...) and whether you use a before_script or not (what I execute before the network lockdown)

Large artifacts: you can configure resource allocated to a VM pod in the sandbox config and it basically uses k8s resource limits.

Let me know if any other question! Happy to help

replies(1): >>45659219 #
2. yjftsjthsd-h ◴[] No.45659219[source]
> No business model short-term. My goal is broad adoption, 100% open-source.

IMHO that's kind of a red flag. There's a happy path here where it's successful but stays low-maintenance enough that you just work on it in your spare time, or it takes of and gets community support, or you get sponsorships or such. But there's also an option where in a year or two it becomes your job and you decide to monetize by rug-pulling and announce that actually paying the bills is more important than staying 100% open source. Not a dig at you, just something that's happened enough times that I get nervous when people don't have a plan and therefore don't have a plan to avoid the outcome that creates problems for users.

replies(1): >>45659393 #
3. gbxk ◴[] No.45659393[source]
Sure one day if it really kicks off I could think of offering additionally a SaaS solution with paid enterprise features like SOC 2 compliance, RBAC, multiple clouds supported, etc. Why not. But I strongly believe that for it to be successful, it needs a strong open-source base. Then, billing huge companies for compliance features or huge usage makes sense. That would support development of the open-source part too.

I like the Docker model, for instance: free for companies under 250 employees or $10m/y revenue.

In any case, it will always be open-source.

Those paid enterprise features wouldn't come from closed-source: they would come from compliance of a particular SaaS-offered infra setup, that anybody else could reproduce. Just like HuggingFace.