←back to thread

DeepSeek OCR

(github.com)
990 points pierre | 5 comments | | HN request time: 0.001s | source
Show context
breadislove ◴[] No.45643006[source]
For everyone wondering how good this and other benchmarks are:

- the OmniAI benchmark is bad

- Instead check OmniDocBench[1] out

- Mistral OCR is far far behind most Open Source OCR models and even further behind then Gemini

- End to End OCR is still extremely tricky

- composed pipelines work better (layout detection -> reading order -> OCR every element)

- complex table parsing is still extremely difficult

[1]: https://github.com/opendatalab/OmniDocBench

replies(2): >>45643626 #>>45647948 #
hakunin ◴[] No.45643626[source]
Wish someone benchmarked Apple Vision Framework against these others. It's built into most Apple devices, but people don't know you can actually harness it to do fast, good quality OCR for you (and go a few extra steps to produce searchable pdfs, which is my typical use case). I'm very curious where it would fall in the benchmarks.
replies(3): >>45643785 #>>45643798 #>>45645485 #
CaptainOfCoit ◴[] No.45643798[source]
Yeah, if it was cross-platform maybe more people would be curious about it, but something that can only run on ~10% of the hardware people have doesn't make it very attractive to even begin to spend time on Apple-exclusive stuff.
replies(2): >>45644313 #>>45644771 #
1. hakunin ◴[] No.45644771[source]
10% of hardware is an insanely vast amount, no?
replies(1): >>45645352 #
2. CaptainOfCoit ◴[] No.45645352[source]
Well, it's 90% less than what everyone else uses, so even if the total number is big, relatively it has a small user-base.
replies(1): >>45645529 #
3. hakunin ◴[] No.45645529[source]
I don’t think 10% of anything would be considered relatively small even if we talk about 10 items: literally there’s only 10 items and this 1 has the rare quality of being among 10. Let alone billions of devices. Unless you want to reduce it to tautology, and instead of answering “why it’s not benchmarked” just go for “10 is smaller than 90, so I’m right”.

My point is, I don’t think any comparative benchmark would ever exclude something based on “oh it’s just 10%, who cares.” I think the issue is more that Apple Vision Framework is not well known as an OCR option, but maybe it’s starting to change.

And another part of the irony is that Apple’s framework probably gets way more real world usage in practice than most of the tools in that benchmark.

replies(1): >>45645708 #
4. CaptainOfCoit ◴[] No.45645708{3}[source]
The initial wish was that more people cared about Apple Vision Framework, I'm merely claiming that since most people don't actually have Apple hardware, they're avoiding Apple technology as it commonly only runs on Apple hardware.

So I'm not saying it should be excluded because it's can only used by relatively few people, but I was trying to communicate that I kind of get why not so many people care about it and why it gets forgotten, since most people wouldn't be able to run it even if they wanted to.

Instead, something like DeepSeek OCR could be deployed on any of the three major OSes (assuming there is implementations of the architecture available), so of course it gets a lot more attention and will be included in way more benchmarks.

replies(1): >>45646063 #
5. hakunin ◴[] No.45646063{4}[source]
I get what you're saying, I'm just disagreeing with your thought process. By that logic benchmarks would also not include the LLMs that they did, since most people wouldn't be able to run those either (it takes expensive hardware). In fact, more people would probably be able to run Vision framework than those LLMs, for cheaper (Vision is even on iPhones). I'm more inclined to agree if you say "maybe people just don't like Apple". :)