←back to thread

DeepSeek OCR

(github.com)
990 points pierre | 6 comments | | HN request time: 0.001s | source | bottom
1. farseer ◴[] No.45640648[source]
How good is this compared to most commercial OCR software?
replies(1): >>45640663 #
2. ozim ◴[] No.45640663[source]
Any vision model is better than commercial OCR software.
replies(3): >>45640798 #>>45640878 #>>45644338 #
3. Etheryte ◴[] No.45640878[source]
I'm not really sure if that's an accurate summary of the state of the art, [0] is a better overview. In short, SOTA multi-modal LLMs are the best option for handwriting, nearly anything is good at printed text, for printed media, specialty models from hyperscalers are slightly better than multi-modal LLMs.

[0] https://research.aimultiple.com/ocr-accuracy/

replies(1): >>45641053 #
4. ozim ◴[] No.45641053{3}[source]
I see it confirms what I wrote state of art is “not using tessaract anymore” and I think bunch of commercial solutions are stuck with tessaract.
replies(1): >>45641452 #
5. ares623 ◴[] No.45641452{4}[source]
I assume Tesseract has the advantage of being able to give a confidence score?
6. dragonwriter ◴[] No.45644338[source]
Since “commercial OCR software” includes VLM-based commercial offerings, that's clearly not correct.