(huggingface.co)

365 points kashifr | 3 comments | 08 Jul 25 16:13 UTC | HN request time: 0.624s | source

Show context

_1 ◴[08 Jul 25 17:09 UTC] No.44501951[source]▶

Which small model is good for fine tuning to various enterprise data sets? Our business units are wanting to run small models in browser and on mobile devices, without dealing with RAG and cloud resources.

replies(5): >>44502175 #>>44502283 #>>44502496 #>>44502868 #>>44508851 #

1. gardnr ◴[08 Jul 25 17:43 UTC] No.44502283[source]▶

>>44501951 #

Small models are bad at knowing things. Trying to train knowledge in to small models is probably not the way you want to go. You could try building an offline embedded RAG system that is deployable as wasm. Some folks have been experiencing success with this.

replies(1): >>44502398 #

2. _1 ◴[08 Jul 25 17:56 UTC] No.44502398[source]▶

>>44502283 (TP) #

We do use WebLLM and a hosted Weaviate database, but there are complaints about speed (both retrieval and time to first token as the context will get big). The Gemma 3n "nesting doll" approach sounds like it could be useful .. but haven't found anyone specifically doing it to add domain specific knowledge.

replies(1): >>44502867 #

3. janalsncm ◴[08 Jul 25 18:50 UTC] No.44502867[source]▶

>>44502398 #

Typically retrieval is the fast part in my experience. Have you considered cheaper retrieval methods? Bm25 does pretty well on its own. And you can augment your dataset by precomputing relevant queries for each doc.

↑

Smollm3: Smol, multilingual, long-context reasoner LLM