I mean anything in the 0.5B-3B range that's available on Ollama (for example). Have you built any cool tooling that uses these models as part of your work flow?

Show context

azhenley ◴[21 Jan 25 20:50 UTC] No.42785041[source]▶

>>42784365 (OP) #

Microsoft published a paper on their FLAME model (60M parameters) for Excel formula repair/completion which outperformed much larger models (>100B parameters).

https://arxiv.org/abs/2301.13779

replies(4): >>42785270 #>>42785415 #>>42785673 #>>42788633 #

1. 3abiton ◴[21 Jan 25 22:05 UTC] No.42785673[source]▶

>>42785041 #

But I feel we're going back full circle. These small models are not generalist, thus not really LLMs at least in terms of objective. Recently there has been a rise of "specialized" models that provide lots of values, but that's not why we were sold on LLMs.

replies(3): >>42785764 #>>42786287 #>>42786397 #

2. colechristensen ◴[21 Jan 25 22:16 UTC] No.42785764[source]▶

>>42785673 (TP) #

But that's the thing, I don't need my ML model to be able to write me a sonnet about the history of beets, especially if I want to run it at home for specific tasks like as a programming assistant.

I'm fine with and prefer specialist models in most cases.

replies(1): >>42786703 #

3. Suppafly ◴[21 Jan 25 23:01 UTC] No.42786287[source]▶

>>42785673 (TP) #

Specialized models work much better still for most stuff. Really we need an LLM to understand the input and then hand it off to a specialized model that actually provides good results.

4. janalsncm ◴[21 Jan 25 23:12 UTC] No.42786397[source]▶

>>42785673 (TP) #

I think playing word games about what really counts as an LLM is a losing battle. It has become a marketing term, mostly. It’s better to have a functionalist point of view of “what can this thing do”.

5. zeroCalories ◴[21 Jan 25 23:44 UTC] No.42786703[source]▶

>>42785764 #

I would love a model that knows SQL really well so I don't need to remember all the small details of the language. Beyond that, I don't see why the transformer architecture can't be applied to any problem that needs to predict sequences.

replies(1): >>42787370 #

6. dr_kiszonka ◴[22 Jan 25 01:02 UTC] No.42787370{3}[source]▶

>>42786703 #

The trick is to find such problems with enough training data and some market potential. I am terrible at it.

↑

Ask HN: Is anyone doing anything cool with tiny language models?