Show HN: Qwen-2.5-32B is now the best open source OCR model

(github.com)

211 points themanmaran | 1 comments | 01 Apr 25 17:00 UTC | HN request time: 1.068s | source

Last week was big for open source LLMs. We got:

- Qwen 2.5 VL (72b and 32b)

- Gemma-3 (27b)

- DeepSeek-v3-0324

And a couple weeks ago we got the new mistral-ocr model. We updated our OCR benchmark to include the new models.

We evaluated 1,000 documents for JSON extraction accuracy. Major takeaways:

- Qwen 2.5 VL (72b and 32b) are by far the most impressive. Both landed right around 75% accuracy (equivalent to GPT-4o’s performance). Qwen 72b was only 0.4% above 32b. Within the margin of error.

- Both Qwen models passed mistral-ocr (72.2%), which is specifically trained for OCR.

- Gemma-3 (27B) only scored 42.9%. Particularly surprising given that it's architecture is based on Gemini 2.0 which still tops the accuracy chart.

The data set and benchmark runner is fully open source. You can check out the code and reproduction steps here:

- https://getomni.ai/blog/benchmarking-open-source-models-for-...

- https://github.com/getomni-ai/benchmark

- https://huggingface.co/datasets/getomni-ai/ocr-benchmark

Show context

daemonologist ◴[01 Apr 25 20:24 UTC] No.43550948[source]▶

>>43549072 (OP) #

You mention that you measured cost and latency in addition to accuracy - would you be willing to share those results as well? (I understand that for these open models they would vary between providers, but it would be useful to have an approximate baseline.)

replies(1): >>43551259 #

themanmaran ◴[01 Apr 25 20:57 UTC] No.43551259[source]▶

>>43550948 #

Yes, I'll add that to the writeup! You're right, initially excluded it because it was really dependent on the providers, so lots of variance. Especially with the Qwen models.

High level results were:

- Qwen 32b => $0.33/1000 pages => 53s/page

- Qwen 72b => $0.71/1000 pages => 51s/page

- Llama 90b => $8.50/1000 pages => 44s/page

- Llama 11b => $0.21/1000 pages => 08s/page

- Gemma 27b => $0.25/1000 pages => 22s/page

- Mistral => $1.00/1000 pages => 03s/page

replies(2): >>43551589 #>>43551686 #

dylan604 ◴[01 Apr 25 21:37 UTC] No.43551589[source]▶

>>43551259 #

One of these things is not like the others. $8.50/1000?? Any chance that's a typo? Otherwise, for someone that has no experience with LLM pricing models, why is Llama 90b so expensive?

replies(2): >>43551806 #>>43552161 #

int_19h ◴[01 Apr 25 22:57 UTC] No.43552161[source]▶

>>43551589 #

It's not uncommon when using brokers to see outliers like this. What happens basically is that some models are very popular and have many different providers, and are priced "close to the metal" since the routing will normally pick the cheapest option with the specified requirements (like context size). But then other models - typically more specialized ones - are only hosted by a single provider, and said provider can then price it much higher than raw compute cost.

E.g. if you look at https://openrouter.ai/models?order=pricing-high-to-low, you'll see that there are some 7B and 8B models that are more expensive than Claude Sonnet 3.7.

replies(1): >>43556385 #

1. nickpsecurity ◴[02 Apr 25 13:20 UTC] No.43556385[source]▶

>>43552161 #

I'll add that some, big-name suppliers with big models might be running at or near a loss on purpose to draw in customers. That behavior is often encouraged by funders who gave them over $100 million to capture the market.

Their theory is they can raise prices once their competitors go out of business. The companies open-sourcing pretrained models are countering that. So, we see a mix of huge models underpriced by scheming companies and open-source models priced for inference with free market principles.

↑