Most active commenters
  • johnisgood(4)

←back to thread

208 points themanmaran | 14 comments | | HN request time: 1.197s | source | bottom

Last week was big for open source LLMs. We got:

- Qwen 2.5 VL (72b and 32b)

- Gemma-3 (27b)

- DeepSeek-v3-0324

And a couple weeks ago we got the new mistral-ocr model. We updated our OCR benchmark to include the new models.

We evaluated 1,000 documents for JSON extraction accuracy. Major takeaways:

- Qwen 2.5 VL (72b and 32b) are by far the most impressive. Both landed right around 75% accuracy (equivalent to GPT-4o’s performance). Qwen 72b was only 0.4% above 32b. Within the margin of error.

- Both Qwen models passed mistral-ocr (72.2%), which is specifically trained for OCR.

- Gemma-3 (27B) only scored 42.9%. Particularly surprising given that it's architecture is based on Gemini 2.0 which still tops the accuracy chart.

The data set and benchmark runner is fully open source. You can check out the code and reproduction steps here:

- https://getomni.ai/blog/benchmarking-open-source-models-for-...

- https://github.com/getomni-ai/benchmark

- https://huggingface.co/datasets/getomni-ai/ocr-benchmark

1. azinman2 ◴[] No.43550807[source]
News update: OCR company touts new benchmark that shows its own products are the most performant.
replies(3): >>43550853 #>>43551091 #>>43551815 #
2. jauntywundrkind ◴[] No.43550853[source]
I searched for any link between OmniAI and Alibaba's Qwen, but I can't find any link. Do you know anything I don't know?

All of these models are open source (I think?). They could presumably build their work on any of these options. It behooves them to pick well. And establish some authority along the way.

replies(1): >>43551175 #
3. johnisgood ◴[] No.43551091[source]
Someone should try to reproduce and post it here. I can't, my PC is about 15 years old. :(

(It is not a joke.)

replies(1): >>43551213 #
4. rustc ◴[] No.43551175[source]
The model with the best accuracy in the linked benchmark is "OmniAI" (OP's company) which looks like a paid model, not open source [1].

[1]: https://getomni.ai/pricing

5. rustc ◴[] No.43551213[source]
Reproducing the whole benchmark would be expensive, OmniAi starts at $250/month.
replies(2): >>43551436 #>>43551449 #
6. themanmaran ◴[] No.43551436{3}[source]
Generally running the whole benchmark is ~$200, since all the providers cost money. But if anyone wants to specifically benchmark Omni just drop us a note and we'll make the credits available.
7. johnisgood ◴[] No.43551449{3}[source]
So not all of them are local and open source? Ugh.
replies(1): >>43551661 #
8. qingcharles ◴[] No.43551661{4}[source]
I don't see why you couldn't run any of those locally if you buy the right hardware?
replies(2): >>43552290 #>>43553147 #
9. kapitalx ◴[] No.43551815[source]
To be fair, they didn't include themselves at all in the graph.
replies(1): >>43552117 #
10. azinman2 ◴[] No.43552117[source]
They did. It’s in the #1 spot

Update: looks like the removed themselves from the graph since I saw it earlier today!

replies(1): >>43552236 #
11. axpy906 ◴[] No.43552236{3}[source]
Yup, they did.

The beauty of version control: https://github.com/getomni-ai/benchmark/commit/0544e2a439423...

12. johnisgood ◴[] No.43552290{5}[source]
I haven't checked myself, so I'm not sure, others might be able to provide the answer though.

If they (all of the mentioned ones) are open source and can be ran locally, then most likely, yes.

From what I remember, they are all local and open source, so the answer is yes, if I am correct.

13. ipsum2 ◴[] No.43553147{5}[source]
Mistral ocr is closed source
replies(1): >>43554713 #
14. johnisgood ◴[] No.43554713{6}[source]
Thanks!