(allenai.org)

361 points mseri | 2 comments | 21 Nov 25 06:50 UTC | HN request time: 0.609s | source

Show context

dangoodmanUT ◴[21 Nov 25 14:44 UTC] No.46005065[source]▶

What are some of the real world applications of small models like this, is it only on-device inference?

In most cases, I'm only seeing models like sonnet being just barely sufficiently for the workloads I've done historically. Would love to know where others are finding use of smaller models (like gpt-oss-120B and below, esp smaller models like this).

Maybe some really lightweight borderline-NLP classification tasks?

replies(3): >>46005122 #>>46005251 #>>46009108 #

1. schopra909 ◴[21 Nov 25 14:51 UTC] No.46005122[source]▶

>>46005065 #

I think you nailed it.

For us it’s classifiers that we train for very specific domains.

You’d think it’d be better to just finetune a smaller non-LLM model, but empirically we find the LLM finetunes (like 7B) perform better.

replies(1): >>46005801 #

2. moffkalast ◴[21 Nov 25 16:11 UTC] No.46005801[source]▶

>>46005122 (TP) #

I think it's no surprise that any model that has a more general understanding of text performs better than some tiny ad-hoc classifier that blindly learns a couple of patterns and has no clue what it's looking at. It's going to fail in much weirder ways that make no sense, like old cnn-based vision models.

↑

Olmo 3: Charting a path through the model flow to lead open-source AI