684 points prettyblocks | 5 comments | 21 Jan 25 19:39 UTC | HN request time: 0.751s | source

I mean anything in the 0.5B-3B range that's available on Ollama (for example). Have you built any cool tooling that uses these models as part of your work flow?

1. JLCarveth ◴[22 Jan 25 01:22 UTC] No.42787549[source]▶

>>42784365 (OP) #

I used a small (3b, I think) model plus tesseract.js to perform OCR on an image of a nutritional facts table and output structured JSON.

replies(3): >>42789249 #>>42789735 #>>42790829 #

2. deivid ◴[22 Jan 25 05:11 UTC] No.42789249[source]▶

>>42787549 (TP) #

What was the model? What kind of performance did you get out of it?

Could you share a link to your project, if it is public?

replies(1): >>42792274 #

3. tigrank ◴[22 Jan 25 06:34 UTC] No.42789735[source]▶

>>42787549 (TP) #

All that server side or client?

4. ian_zcy ◴[22 Jan 25 09:27 UTC] No.42790829[source]▶

>>42787549 (TP) #

what are you feed into the model? Image (like product packaging) or Image of Structured Table? I found out that model performs good in general with sturctured table, but fails a lot over images.

5. JLCarveth ◴[22 Jan 25 12:54 UTC] No.42792274[source]▶

>>42789249 #

https://github.com/JLCarveth/nutrition-llama

I've had good speed / reliability with TheBloke/rocket-3B-GGUF on Huggingface, the Q2_K model. I'm sure there are better models out there now, though.

It takes ~8-10 seconds to process an image on my M2 Macbook, so not quite quick enough to run on phones yet, but the accuracy of the output has been quite good.

↑

Ask HN: Is anyone doing anything cool with tiny language models?