←back to thread

217 points HenryNdubuaku | 2 comments | | HN request time: 0.621s | source

Hey HN, Henry and Roman here - we've been building a cross-platform framework for deploying LLMs, VLMs, Embedding Models and TTS models locally on smartphones.

Ollama enables deploying LLMs models locally on laptops and edge severs, Cactus enables deploying on phones. Deploying directly on phones facilitates building AI apps and agents capable of phone use without breaking privacy, supports real-time inference with no latency, we have seen personalised RAG pipelines for users and more.

Apple and Google actively went into local AI models recently with the launch of Apple Foundation Frameworks and Google AI Edge respectively. However, both are platform-specific and only support specific models from the company. To this end, Cactus:

- Is available in Flutter, React-Native & Kotlin Multi-platform for cross-platform developers, since most apps are built with these today.

- Supports any GGUF model you can find on Huggingface; Qwen, Gemma, Llama, DeepSeek, Phi, Mistral, SmolLM, SmolVLM, InternVLM, Jan Nano etc.

- Accommodates from FP32 to as low as 2-bit quantized models, for better efficiency and less device strain.

- Have MCP tool-calls to make them performant, truly helpful (set reminder, gallery search, reply messages) and more.

- Fallback to big cloud models for complex, constrained or large-context tasks, ensuring robustness and high availability.

It's completely open source. Would love to have more people try it out and tell us how to make it great!

Repo: https://github.com/cactus-compute/cactus

1. neurostimulant ◴[] No.44530017[source]
This is great!

It would be great if the local llm have access to local tools you can enable/disable as needed (e.g. via customizable profiles). Simple tools such as fetch url, file access, messaging, calendar, etc would be very useful, though I'm not sure if the input token limit is large enough to allow this. Even better if it can somehow do web search but I understand it would be hard to do for free.

Also, how cool it would be if you can expose openai compatible api that can be accessed from other devices in your local network? Imagine turning your old phones into local llm servers. That would be very cool.

By the way, I can't figure out how to clear previous chats data. Is it hidden somewhere?

replies(1): >>44534168 #
2. rshemet ◴[] No.44534168[source]
no, good observation - not hidden; we don't have a "clear conversation" button.

to your previous point - Cactus fully supports tool calling (for models that have been instruction-trained accordingly, e.g. Qwen 1.7B)

for "turning your old phones into local llm servers", Cactus is likely not the best tool. We'd recommend something like actual Ollama or Exo