←back to thread

684 points prettyblocks | 5 comments | | HN request time: 0.751s | source

I mean anything in the 0.5B-3B range that's available on Ollama (for example). Have you built any cool tooling that uses these models as part of your work flow?
1. JLCarveth ◴[] No.42787549[source]
I used a small (3b, I think) model plus tesseract.js to perform OCR on an image of a nutritional facts table and output structured JSON.
replies(3): >>42789249 #>>42789735 #>>42790829 #
2. deivid ◴[] No.42789249[source]
What was the model? What kind of performance did you get out of it?

Could you share a link to your project, if it is public?

replies(1): >>42792274 #
3. tigrank ◴[] No.42789735[source]
All that server side or client?
4. ian_zcy ◴[] No.42790829[source]
what are you feed into the model? Image (like product packaging) or Image of Structured Table? I found out that model performs good in general with sturctured table, but fails a lot over images.
5. JLCarveth ◴[] No.42792274[source]
https://github.com/JLCarveth/nutrition-llama

I've had good speed / reliability with TheBloke/rocket-3B-GGUF on Huggingface, the Q2_K model. I'm sure there are better models out there now, though.

It takes ~8-10 seconds to process an image on my M2 Macbook, so not quite quick enough to run on phones yet, but the accuracy of the output has been quite good.