Smollm3: Smol, multilingual, long-context reasoner LLM

(huggingface.co)

347 points kashifr | 1 comments | 08 Jul 25 16:13 UTC | HN request time: 0.201s | source

Show context

_1 ◴[08 Jul 25 17:09 UTC] No.44501951[source]▶

Which small model is good for fine tuning to various enterprise data sets? Our business units are wanting to run small models in browser and on mobile devices, without dealing with RAG and cloud resources.

replies(5): >>44502175 #>>44502283 #>>44502496 #>>44502868 #>>44508851 #

mhitza ◴[08 Jul 25 17:31 UTC] No.44502175[source]▶

>>44501951 #

You really need to try them all out yourself and make sure you have proper benchmarks.

While machine learning is not my field, I've tried to finetune Mistral 7B (following their official guide and toolset) and the results did not satisfy. Had a few very specific questions from the dataset that no matter how much I've finetuned and tweaked the process it was not able to respond with correct information.

A mix of vector search + keyword search is still better at building the right question context than expecting it to learn all the information.

I've used the pretrained dataset approach. Maybe building syntethic questions and answers around the dataset yields better results but I didn't have time to experiment with that approach.

replies(2): >>44503664 #>>44505274 #

ivape ◴[08 Jul 25 20:16 UTC] No.44503664[source]▶

>>44502175 #

How much data did you use to fine tune?

replies(1): >>44503815 #

1. mhitza ◴[08 Jul 25 20:34 UTC] No.44503815[source]▶

>>44503664 #

Kilobytes to megabytes of data. I was trying to fine-tune it for some specific legislation I was expecting to be able afterwards to ask about.

↑