Which small model is good for fine tuning to various enterprise data sets? Our business units are wanting to run small models in browser and on mobile devices, without dealing with RAG and cloud resources.
Small models are bad at knowing things. Trying to train knowledge in to small models is probably not the way you want to go. You could try building an offline embedded RAG system that is deployable as wasm. Some folks have been experiencing success with this.
We do use WebLLM and a hosted Weaviate database, but there are complaints about speed (both retrieval and time to first token as the context will get big). The Gemma 3n "nesting doll" approach sounds like it could be useful .. but haven't found anyone specifically doing it to add domain specific knowledge.
Typically retrieval is the fast part in my experience. Have you considered cheaper retrieval methods? Bm25 does pretty well on its own. And you can augment your dataset by precomputing relevant queries for each doc.