←back to thread

217 points HenryNdubuaku | 3 comments | | HN request time: 0.001s | source

Hey HN, Henry and Roman here - we've been building a cross-platform framework for deploying LLMs, VLMs, Embedding Models and TTS models locally on smartphones.

Ollama enables deploying LLMs models locally on laptops and edge severs, Cactus enables deploying on phones. Deploying directly on phones facilitates building AI apps and agents capable of phone use without breaking privacy, supports real-time inference with no latency, we have seen personalised RAG pipelines for users and more.

Apple and Google actively went into local AI models recently with the launch of Apple Foundation Frameworks and Google AI Edge respectively. However, both are platform-specific and only support specific models from the company. To this end, Cactus:

- Is available in Flutter, React-Native & Kotlin Multi-platform for cross-platform developers, since most apps are built with these today.

- Supports any GGUF model you can find on Huggingface; Qwen, Gemma, Llama, DeepSeek, Phi, Mistral, SmolLM, SmolVLM, InternVLM, Jan Nano etc.

- Accommodates from FP32 to as low as 2-bit quantized models, for better efficiency and less device strain.

- Have MCP tool-calls to make them performant, truly helpful (set reminder, gallery search, reply messages) and more.

- Fallback to big cloud models for complex, constrained or large-context tasks, ensuring robustness and high availability.

It's completely open source. Would love to have more people try it out and tell us how to make it great!

Repo: https://github.com/cactus-compute/cactus

Show context
refulgentis ◴[] No.44526188[source]
[flagged]
replies(2): >>44526266 #>>44527553 #
HenryNdubuaku ◴[] No.44527553[source]
Thanks for the comment, but:

1) The commit history goes back to April.

2) LlaMa.cpp licence is included in the Repo where necessary like Ollama, until it is deprecated.

3) Flutter isolates behave like servers, and Cactus codes use that.

replies(1): >>44527661 #
refulgentis ◴[] No.44527661[source]
[flagged]
replies(1): >>44528044 #
HenryNdubuaku ◴[] No.44528044[source]
We are following Ollama's design, but not verbatim due to apps being sandboxed.

Phones are resource-constrained, we saw significant battery overhead with in-process HTTP listeners so we stuck with simple stateful isolates in Flutter and exploring standalone server app others can talk to for React.

For model sharing with the current setup:

iOS - We are working towards writing the model into an App Group container, tricky but working around it.

Android - We are working towards prompting the user once for a SAF directory (e.g., /Download/llm_models), save the model there, then publish a ContentProvider URI for zero-copy reads.

We are already writing more mobile-friendly kernels and Tensors, but GGML/GGUF is widely supported, porting it is an easy way to get started and collect feedback, but we will completely move away from in < 2 months.

Anything else you would like to know?

replies(1): >>44528069 #
refulgentis ◴[] No.44528069[source]
How does writing a model into an App Group container enable your framework to enable an app to enable a local LLM server that 3rd party apps can make calls to on iOS?[^1]

How does writing a model into a shared directory on Android enable a local LLM server that 3rd party apps can make calls to?[^2]

How does writing your own kernels get you off GGUF in 2 months? GGUF is a storage format. You use kernels to do things with the numbers you get from it.

I thought GGUF was an advantage? Now it's something you're basically done using?

I don't think you should continue this conversation. As easy it as it is to get your work out there, it's just as easy to build a record of stretching truth over and over again.

Best of luck, and I mean it. Just, memento mori: be honest and humble along the way. This is something you will look back on in a year and grimace.

[^1] App group containers only work between apps signed from the same Apple developer account. Additionally, that is shared storage, not a way to provide APIs to other apps.

[^2] SAF = Storage Access Framework, that is shared storage, not a way to provide APIs to other apps.

replies(1): >>44528640 #
1. HenryNdubuaku ◴[] No.44528640[source]
[flagged]
replies(2): >>44528858 #>>44529813 #
2. refulgentis ◴[] No.44528858[source]
[flagged]
3. jeffhuys ◴[] No.44529813[source]
The best way to go about this is realizing that there are more people reading this thread that make their own assumptions.

Not staying professional and just answering the questions, and just doing "aight im outta here" when it gets a little bit harder is not a good look; it seems like you can't defend your own project.

Just FYI.