Since AI performs better the more context you give it, we think solving AI privacy will unlock more valuable AI applications, just how TLS on the Internet enabled e-commerce to flourish knowing that your credit card info wouldn't be stolen by someone sniffing internet packets.
We come from backgrounds in cryptography, security, and infrastructure. Jules did his PhD in trusted hardware and confidential computing at MIT, and worked with NVIDIA and Microsoft Research on the same, Sacha did his PhD in privacy-preserving cryptography at MIT, Nate worked on privacy tech like Tor, and I (Tanya) was on Cloudflare's cryptography team. We were unsatisfied with band-aid techniques like PII redaction (which is actually undesirable in some cases like AI personal assistants) or “pinky promise” security through legal contracts like DPAs. We wanted a real solution that replaced trust with provable security.
Running models locally or on-prem is an option, but can be expensive and inconvenient. Fully Homomorphic Encryption (FHE) is not practical for LLM inference for the foreseeable future. The next best option is using secure enclaves: a secure environment on the chip that no other software running on the host machine can access. This lets us perform LLM inference in the cloud while being able to prove that no one, not even Tinfoil or the cloud provider, can access the data. And because these security mechanisms are implemented in hardware, there is minimal performance overhead.
Even though we (Tinfoil) control the host machine, we do not have any visibility into the data processed inside of the enclave. At a high level, a secure enclave is a set of cores that are reserved, isolated, and locked down to create a sectioned off area. Everything that comes out of the enclave is encrypted: memory and network traffic, but also peripheral (PCIe) traffic to other devices such as the GPU. These encryptions are performed using secret keys that are generated inside the enclave during setup, which never leave its boundaries. Additionally, a “hardware root of trust” baked into the chip lets clients check security claims and verify that all security mechanisms are in place.
Up until recently, secure enclaves were only available on CPUs. But NVIDIA confidential computing recently added these hardware-based capabilities to their latest GPUs, making it possible to run GPU-based workloads in a secure enclave.
Here’s how it works in a nutshell:
1. We publish the code that should run inside the secure enclave to Github, as well as a hash of the compiled binary to a transparency log called Sigstore
2. Before sending data to the enclave, the client fetches a signed document from the enclave which includes a hash of the running code signed by the CPU manufacturer. It then verifies the signature with the hardware manufacturer to prove the hardware is genuine. Then the client fetches a hash of the source code from a transparency log (Sigstore) and checks that the hash equals the one we got from the enclave. This lets the client get verifiable proof that the enclave is running the exact code we claim.
3. With the assurance that the enclave environment is what we expect, the client sends its data to the enclave, which travels encrypted (TLS) and is only decrypted inside the enclave.
4. Processing happens entirely within this protected environment. Even an attacker that controls the host machine can’t access this data. We believe making end-to-end verifiability a “first class citizen” is key. Secure enclaves have traditionally been used to remove trust from the cloud provider, not necessarily from the application provider. This is evidenced by confidential VM technologies such as Azure Confidential VM allowing ssh access by the host into the confidential VM. Our goal is to provably remove trust both from ourselves, aka the application provider, as well as the cloud provider.
We encourage you to be skeptical of our privacy claims. Verifiability is our answer. It’s not just us saying it’s private; the hardware and cryptography let you check. Here’s a guide that walks you through the verification process: https://docs.tinfoil.sh/verification/attestation-architectur....
People are using us for analyzing sensitive docs, building copilots for proprietary code, and processing user data in agentic AI applications without the privacy risks that previously blocked cloud AI adoption.
We’re excited to share Tinfoil with HN!
* Try the chat (https://tinfoil.sh/chat): It verifies attestation with an in-browser check. Free, limited messages, $20/month for unlimited messages and additional models
* Use the API (https://tinfoil.sh/inference): OpenAI API compatible interface. $2 / 1M tokens
* Take your existing Docker image and make it end to end confidential by deploying on Tinfoil. Here's a demo of how you could use Tinfoil to run a deepfake detection service that could run securely on people's private videos: https://www.youtube.com/watch?v=_8hLmqoutyk. Note: This feature is not currently self-serve.
* Reach out to us at contact@tinfoil.sh if you want to run a different model or want to deploy a custom application, or if you just want to learn more!
Let us know what you think, we’d love to hear about your experiences and ideas in this space!
Do you run into rate limits or other issues with TLS cert issuance? One problem we had when doing this before is that each spinup of the enclave must generate a fresh public key, so it needs a fresh, publicly trusted TLS cert. Do you have a workaround for that, or do you just have the enclaves run for long enough that it doesn’t matter?
Also, if you're decoding TLS on the enclave, wouldn't that imply that you're parsing HTTP and JSON on the GPU itself? Very interesting if true.
This will let us fix the rate limit issue.
HTTP parsing and application logic happens on the CPU like normal. The GPU runs CUDA just like any other app, after it's integrity is verified by the CPU. Data on the PCIe bus is encrypted between the CPU and GPU too.
[1] https://github.com/NVIDIA/nvtrust/blob/main/guest_tools/atte...
Does the CPU have the ability to see unencrypted data?
Yes.
>How do you load balance or front end all of this effectively?
We don't, atleast not yet. That's why all our model endpoints have different subdomains. In the next couple months, we're planning to generate a keypair inside the enclave using HPKE that will be used to encrypt the data, as I described in this comment: https://news.ycombinator.com/item?id=43996849
When the enclave starts, the CPU does a few things:
1. The CPU does a key exchange with the GPU (in confidential compute mode [1]) to derive a key to encrypt data over PCIe
2. The CPU verifies the integrity of the GPU against NVIDIA's root of trust [2]
[1] https://developer.nvidia.com/blog/confidential-computing-on-...
[2] https://github.com/tinfoilsh/cvmimage/blob/b65ced8796e8a8687...
edit: formatting
I think there is similarity to https://www.anjuna.io/ and https://www.opaque.co/ here. I've heard of these, never iExec.
I'm not entirely sure this is different than "security by contract", except the contracts get bigger and have more technology around them?
Want to connect some time next Tuesday or Wednesday? https://calendly.com/qbix/meeting
The real benefit of confidential computing is to extend that trust to the source code too (the inference server, OS, firmware).
Maybe one day we’ll have truly open hardware ;)
This approach relies too much on trust.
If you have data you are seriously sensitive about, its better for you to run models locally on air gapped instances.
If you think this is an overkill, just see what happened to coinbase of recent. [0]
[0]: https://www.cnbc.com/2025/05/15/coinbase-says-hackers-bribed...
Can you talk about how this relates to / is different / is differentiated from what Apple claimed to do during their last WWDC? They called it "private cloud compute". (To be clear, after 11 months, this is still "announced", with no implementation anywhere, as far as I can see.)
Here is their blog post on Apple Security, dated June 10: https://security.apple.com/blog/private-cloud-compute/
EDIT: JUST found the tinfoil blog post on exactly this topic. https://tinfoil.sh/blog/2025-01-30-how-do-we-compare
If instead users must use your web-served client code each time, you could subtly alter that over time or per-user, in ways unlikely to be detected by casual users – who'd then again be required to trust you (Tinfoil), rather than the goal on only having to trust the design & chip-manufacturer.
I tried taking a look at your documentation, but the site search is very slow and laggy in Firefox.
If you'd rather self-host, then the HazyResearch Lab at Stanford recently announced a FOSS e2ee implementation ("Minions") for Inference: https://hazyresearch.stanford.edu/blog/2025-05-12-security / https://github.com/HazyResearch/Minions
We're also doing more than pure inference, and trying to work with other companies who want to provide their users additional verifiability and confidentiality guarantees by running their entire private data processing pipeline on our platform.
> Maybe one day we'll have truly open hardware
At least the RoT/SE if nothing else: https://opentitan.org/
You aren't enterprise ready because to address those concerns you need to get the laundry list of compliance certs: SOC 2:2, ISO 27k1/2 and 9k1, HIPPA, GDPR, CMMC, FedRAMP, NIST, etc.
I see you have to trust NVidia etc. so maybe there are such backdoors.
Why couldn't the enclave claim to be running an older hash?
Another (IMO more likely) scenario is someone finds a hardware vulnerability (or leaked signing keys) that let's them achieve a similar outcome.
The pricing page implies you're basically reselling access to confidential-wrapped AI instances.
Since you rightly open-sourced the code (AGPL) is there anything stopping the cloud vendors from running and selling access to their own instances of your server-side magic?
Is your secret sauce the tooling to spin up and manage instances and ease customer UX? Do you aim to attract an ecosystem of turnkey, confidential applications running on your platform?
Do you envision an exit strategy that sells said secret sauce and customers to a cloud provider or confidential computing middleware provider?
Ps. Congrats on the launch.
Sure they can do that. Despite being open source, CC-mode on GPUs is quite difficult to work with especially when you start thinking about secrets management, observability etc, so we’d actually like to work with smaller cloud providers who want to provide this as a service and become competitive with the big clouds.
>Is your secret sauce the tooling to spin up and manage instances and ease customer UX?
Pretty much. Confidential computing has been around a while, and we still don’t see widespread adoption of it, largely because of the difficulty. If we're successful, we absolutely expect there to be a healthy ecosystem of competitors both cloud provider and startup.
>Do you envision an exit strategy that sells that secret sauce to a cloud provider or confidential computing middleware provider?
We’re not really trying to be a confidential computing provider, but more so, a verifiably private layer for AI. Which means we will try to make integration points as seamless as possible. For inference, that meant OpenAI API compatible client SDKs, we will eventually do the same for training/post-training, or MCP/OpenAI Agents SDK, etc. We want our integration points to be closely compatible with existing pipelines.
That said, a sufficiently resourced attacker wouldn’t need to inject a backdoor at all. If the attacker already possesses the keys (e.g. the attacker IS the hardware manufacturer, or they’ve coerced the manufacturer to hand the keys over), then they would just need to gain access to the host server (which we control) to get access to the hypervisor, then use their keys to read memory or launch a new enclave with a forged attestation. We're planning on writing a much more detailed blog post about "how to hack ourselves" in the future.
We actually plan to do an experiment at DEFCON, likely next year where we gives ssh access to a test machine running the enclave and have people try to exfiltrate data from inside the enclave while keeping the machine running.
[1] https://github.com/tinfoilsh/cvmimage
I have worked for many enterprise companies e.g. banks who are trialling AI and none of them have any use for something like this. Because the entire foundation of the IT industry is based on trusting the privacy and security policies of Azure, AWS and GCP. And in the decades since they've been around not heard of a single example of them breaking this.
The proposition here is to tell a company that they can trust Azure with their banking websites, identity services and data engineering workloads but not for their model services. It just doesn't make any sense. And instead I should trust a YC startup who statistically is going to be gone in a year and will likely have their own unique set of security and privacy issues.
Also you have the issue of smaller sized open source models e.g. DeepSeek R1 lagging far behind the bigger ones and so you're giving me some unnecessary privacy attestation at the expense of a model that will give me far better accuracy and performance.
And you make this claim that the cloud provider can SSH into the VM but (a) nobody serious exposes SSH ports in Production and (b) there is no documented evidence of this ever happening.
We're simply trying to bring similar capabilities to other companies. Inference is just our first product.
>cloud provider can SSH into the VM
The point we were making was that CC was traditionally used to remove trust from cloud providers, but not the application provider. We are further removing trust from ourselves (as the application provider), and we can enable our customers (who could be other startups or neoclouds) to remove trust from themselves and prove that to their customers.
This is not the reason at all. Complexity and difficult are inherent to large companies.
It's because it is a very low priority in an environment where for example there are tens of thousands of libraries in use, dozens of which will be in Production with active CVEs. And there are many examples of similar security and risk management issues that companies have to deal with.
Worrying about the integrity of the hardware or not trusting my cloud provider who has all my data in their S3 buckets anyway (which is encrypted using their keys) is not high on my list of concerns. And if it were I would be simply running on-premise anyway.
No. The only way is to not use cloud computing at all and go on-premise.
Which is what companies around the world do today for security or privacy critical workloads.
There are a multitude of components between my app and your service. You have secured one of them arguably the least important. But you can't provide any guarantees over say your API server that my requests are going through. Or your networking stack which someone e.g. a government could MITM.
Host a machine on the internet. Allow competitors to sign up to receive root ssh credentials. Offer a $10K prize if they are able to determine plaintext inputs and outputs over a given time period (say one month).
A bit of a strawman, but a competition like this might help build confidence.
But making it a public competition is a fantastic idea.
Not to mention GCP and Azure both have confidential GPU offerings. How do you compete against them, as well as some startups mentioned in other comments like Edgeless Systems and Opaque Systems?
[1] https://github.com/tinfoilsh/tinfoil-python [2] https://github.com/tinfoilsh/verifier
As former CTO of world's largest bank and cloud architect at world's largest hedge fund, this is exactly opposite of my experience with both regulated finance enterprises and the CSPs vying to serve them.
The entire foundation of the IT industry is based on trusting the privacy and security policies of Azure, AWS and GCP. And in the decades since they've been around not heard of a single example of them breaking this.
On the contrary, many global banks design for the assumption the "CSP is hostile". What happened to Coinbase's customers the past few months shows why your vendor's insider threat is your threat and your customers' threat.
Granted, this annoys CSPs who wish regulators would just let banks "adopt" the CSP's controls and call it a day.
Unfortunately for CSP sales teams — certainly this could change with recent regulator policy changes — the regulator wins. Until very recently, only one CSP offered controls sufficient to assure your own data privacy beyond a CSP's pinky-swears. AWS Nitro Enclaves can provide a key component in that assurance, using deployment models such as tinfoil.
This point of view may be based on a lack of information about how global finance handles security and privacy critical workloads in high-end cloud.
Global banks and the CSPs that serve them have by and large solved this problem by the late 2010s - early 2020s.
While much of the work is not published, you can look for presentations at AWS reInvent from e.g. Goldman Sachs or others willing to share about it, talking about cryptographic methods, enclaves, formal reasoning over not just code but things like reachability, and so on, to see the edges of what's being done in this space.
For example if you do tools or RAG you probably ought have abuse@ as well, even though only 4 people will think to email that.
For context:
My wife does leadership coaching and recently used vanilla GPT-4o via ChatGPT to summarize a transcript of an hour-long conversation.
Then, last weekend we thought... "Hey, let's test local LLMs for more privacy control. The open source models must be pretty good in 2025."
So I installed Ollama + Open WebUI plus the models on a 128GB MacBook Pro.
I am genuinely dumbfounded about the actual results we got today of comparing ChatGPT/GPT-4o vs. Llama4, Llama3.3, Llama3.2, DeepSeekR1 and Gemma.
In short: Compared to our reference GPT-4o output, none (as in NONE, zero, zilch, nil) of the above-mentioned open source models were able to create even a basic summary based on the exact same prompt + text.
The open source summaries were offensively bad. It felt like reading the most bland, generic and idiotic SEO slop I've read since I last used Google. None of the obvious topics were part of the summary. Just blah. I tested this with 5 models to boot!
I'm not an OpenAI fan per se, but if this is truly OS/SOTA then, we shouldn't even mention Llama4 or the others in the same breath as the newer OpenAI models.
What do you think?
Please shoot me an email at tanya@tinfoil.sh, would love to work through your use cases.