(github.com)

217 points HenryNdubuaku | 2 comments | 10 Jul 25 19:20 UTC | HN request time: 0.718s | source

Hey HN, Henry and Roman here - we've been building a cross-platform framework for deploying LLMs, VLMs, Embedding Models and TTS models locally on smartphones.

Ollama enables deploying LLMs models locally on laptops and edge severs, Cactus enables deploying on phones. Deploying directly on phones facilitates building AI apps and agents capable of phone use without breaking privacy, supports real-time inference with no latency, we have seen personalised RAG pipelines for users and more.

Apple and Google actively went into local AI models recently with the launch of Apple Foundation Frameworks and Google AI Edge respectively. However, both are platform-specific and only support specific models from the company. To this end, Cactus:

- Is available in Flutter, React-Native & Kotlin Multi-platform for cross-platform developers, since most apps are built with these today.

- Supports any GGUF model you can find on Huggingface; Qwen, Gemma, Llama, DeepSeek, Phi, Mistral, SmolLM, SmolVLM, InternVLM, Jan Nano etc.

- Accommodates from FP32 to as low as 2-bit quantized models, for better efficiency and less device strain.

- Have MCP tool-calls to make them performant, truly helpful (set reminder, gallery search, reply messages) and more.

- Fallback to big cloud models for complex, constrained or large-context tasks, ensuring robustness and high availability.

It's completely open source. Would love to have more people try it out and tell us how to make it great!

Repo: https://github.com/cactus-compute/cactus

1. nunobrito ◴[11 Jul 25 10:38 UTC] No.44530591[source]▶

>>44524544 (OP) #

I've installed the Android version from https://play.google.com/store/apps/details?id=com.rshemetsub...

It is fantastic. Compared to another program I had installed a year ago, the speed of processing and answering is really good and accurate. Was able to ask mathematical questions, basic translation between different languages and even trivia about movies released almost 30 years ago.

Things to improve: 1) sometimes the question would get stuck on the last phrase and keep repeating it without end. 2) The chat does not scroll the window to follow the answer and we have to scroll manually.

In either case, excellent start. It is without the fastest offline LLM that I've seen working on this phone.

replies(1): >>44534199 #

2. rshemet ◴[11 Jul 25 16:36 UTC] No.44534199[source]▶

>>44530591 (TP) #

thank you! Very kind feedback, and we'll add your feedback to our to-dos.

re: "question would get stuck on the last phrase and keep repeating it without end." - that's a limitation of the model i'm afraid. Smaller models tend to do that sometimes.

↑

Show HN: Cactus – Ollama for Smartphones