Ask HN: Is anyone doing anything cool with tiny language models?

684 points prettyblocks | 1 comments | 21 Jan 25 19:39 UTC | HN request time: 0s | source

I mean anything in the 0.5B-3B range that's available on Ollama (for example). Have you built any cool tooling that uses these models as part of your work flow?

Show context

jbentley1 ◴[22 Jan 25 14:08 UTC] No.42792994[source]▶

>>42784365 (OP) #

Tiny language models can do a lot if they are fine tuned for a specific task, but IMO a few things are holding them back:

1. Getting the speed gains is hard unless you are able to pay for dedicated GPUs. Some services offer LoRA as serverless but you don't get the same performance for various technical reasons.

2. Lack of talent to actually do the finetuning. Regular engineers can do a lot of LLM implementation, but when it comes to actually performing training it is a scarcer skillset. Most small to medium orgs don't have people who can do it well.

3. Distribution. Sharing finetunes is hard. HuggingFace exists, but discoverability is an issue. It is flooded with random models with no documentation and it isn't easy to find a good oen for your task. Plus, with a good finetune you also need the prompt and possibly parsing code to make it work the way it is intended and the bundling hasn't been worked out well.

replies(1): >>42793782 #

grisaitis ◴[22 Jan 25 15:25 UTC] No.42793782[source]▶

>>42792994 #

when you say fine-tuning skills or talent are scarce, do you have specific skills in mind? perhaps engineering for training models (eg making model parallelism work)? or the more ML type skills of designing experiments, choosing which methods to use, figuring out datasets for training, hyperparam tuning/evaluation, etc?

replies(1): >>42793823 #

jbentley1 ◴[22 Jan 25 15:29 UTC] No.42793823[source]▶

>>42793782 #

The technical parts are less common and specialized, like understanding the hyperparameters and all that, but I don't think that is the main problem. Most people don't understand how to build a good dataset or how to evaluate their finetune after training. Some parts of this are solid rules like always use a separate validation set, but the task dependent parts are harder to teach. It's a different problem every time.

replies(1): >>42802504 #

1. menaerus ◴[23 Jan 25 09:59 UTC] No.42802504[source]▶

>>42793823 #

Finetuning, as I understand it, is mostly laborious and mostly very boring and exhausting work that is not appealing to many engineers. It can be done by people who have some skills in Python or similar language and who have some background in statistics.

OTOH to build the infra for LLMs there's much more stuff involved and it's really hard to find engineers who have the capacity to be both the researchers and developers at the same time. By "researchers" I mean that they have to have a capacity to be able to read through the numerous academic and industry papers, comprehend the tiniest details, and materialize it into the product through the code. I think that's much harder and scarcer skill to find.

That said, I am not undermining the fine-tuning skill, it's a humongous effort, but I think it's not necessarily the skillset problem.

↑