Show HN: Cactus – Ollama for Smartphones

1. deepdarkforest ◴[10 Jul 25 20:39 UTC] No.44525324[source]▶

[flagged]

replies(4): >>44525596 #>>44525702 #>>44528378 #>>44534059 #

2. rshemet ◴[10 Jul 25 21:04 UTC] No.44525596[source]▶

Thanks for the feedback. You're right to point out that Google AI Edge is cross-platform and more flexible than our phrasing suggested.

The core distinction is in the ecosystem: Google AI Edge runs tflite models, whereas Cactus is built for GGUF. This is a critical difference for developers who want to use the latest open-source models.

One major outcome of this is model availability. New open source models are released in GGUF format almost immediately. Finding or reliably converting them to tflite is often a pain. With Cactus, you can run new GGUF models on the day they drop on Huggingface.

Quantization level also plays a role. GGUF has mature support for quantization far below 8-bit. This is effectively essential for mobile. Sub-8-bit support in TFLite is still highly experimental and not broadly applicable.

Last, Cactus excels at CPU inference. While tflite is great, its peak performance often relies on specific hardware accelerators (GPUs, DSPs). GGUF is designed for exceptional performance on standard CPUs, offering a more consistent baseline across the wide variety of devices that app developers have to support.

replies(2): >>44525799 #>>44530435 #

3. DarmokJalad1701 ◴[10 Jul 25 21:15 UTC] No.44525702[source]▶

>>44525324 (TP) #

I would say that while Google's MediaPipe can technically run any tflite model, it turned out to be a lot more difficult to do in practice with third-party models compared to the "officially supported" models like Gemma-3n. I was trying to set up a VLM inference pipeline using a SmolVLM model. Even after converting it to a tfilte-compatible binary, I struggled to get it working and then once it did work, it was super slow and was obviously missing some hardware acceleration.

I have not looked at OP's work yet, but if it makes the task easier, I would opt for that instead of Google's "MediaPipe" API.

4. deepdarkforest ◴[10 Jul 25 21:25 UTC] No.44525799[source]▶

>>44525596 #

No worries.

GGUF is more suitable for the latest open-source models, i agree there. Quant2/Q4 will probably be critical as well, if we don't see a jump in ram. But then again I wonder when/If mediapipe will support GGUF as well.

PS, I see you are in the latest YC batch? (below you mentioned BF). Good luck and have fun!

5. pj_mukh ◴[11 Jul 25 04:26 UTC] No.44528378[source]▶

>>44525324 (TP) #

Does Google AI Edge have React Native support? Doesn't seem like it? Cactus does though.

6. blks ◴[11 Jul 25 10:14 UTC] No.44530435[source]▶

>>44525596 #

First paragraph reads like chat gpt response.

replies(1): >>44530911 #

7. poly2it ◴[11 Jul 25 11:26 UTC] No.44530911{3}[source]▶

>>44530435 #

Not just the first paragraph, the whole response reads like LLM output.

8. dang ◴[11 Jul 25 16:24 UTC] No.44534059[source]▶

>>44525324 (TP) #

> as you are for sure aware

> Why lie?

Whoa—that's way too aggressive for this forum and definitely against the site guidelines. Could you please review them (https://news.ycombinator.com/newsguidelines.html) and take the spirit of this site more to heart? We'd appreciate it. You can always make your substantive points while doing that.

Note this one: "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."