←back to thread

361 points mseri | 1 comments | | HN request time: 0s | source
Show context
dangoodmanUT ◴[] No.46005065[source]
What are some of the real world applications of small models like this, is it only on-device inference?

In most cases, I'm only seeing models like sonnet being just barely sufficiently for the workloads I've done historically. Would love to know where others are finding use of smaller models (like gpt-oss-120B and below, esp smaller models like this).

Maybe some really lightweight borderline-NLP classification tasks?

replies(3): >>46005122 #>>46005251 #>>46009108 #
fnbr ◴[] No.46005251[source]
(I’m a researcher on the post-training team at Ai2.)

7B models are mostly useful for local use on consumer GPUs. 32B could be used for a lot of applications. There’s a lot of companies using fine tuned Qwen 3 models that might want to switch to Olmo now that we have released a 32B base model.

replies(2): >>46005571 #>>46010965 #
1. kurthr ◴[] No.46010965[source]
Are there quantized (eg 4bit) models available yet? I assume the training was done in BF16, but it seems like most inference models are distributed in BF8 until they're quantized.

edit ahh I see it on huggingface: https://huggingface.co/mlx-community/Olmo-3-1125-32B-4bit