(github.com)

990 points pierre | 5 comments | 20 Oct 25 06:26 UTC | HN request time: 0.956s | source

Show context

farseer ◴[20 Oct 25 06:44 UTC] No.45640648[source]▶

>>45640594 (OP) #

How good is this compared to most commercial OCR software?

replies(1): >>45640663 #

1. ozim ◴[20 Oct 25 06:47 UTC] No.45640663[source]▶

>>45640648 #

Any vision model is better than commercial OCR software.

replies(3): >>45640798 #>>45640878 #>>45644338 #

2. Etheryte ◴[20 Oct 25 07:31 UTC] No.45640878[source]▶

>>45640663 (TP) #

I'm not really sure if that's an accurate summary of the state of the art, [0] is a better overview. In short, SOTA multi-modal LLMs are the best option for handwriting, nearly anything is good at printed text, for printed media, specialty models from hyperscalers are slightly better than multi-modal LLMs.

[0] https://research.aimultiple.com/ocr-accuracy/

replies(1): >>45641053 #

3. ozim ◴[20 Oct 25 08:00 UTC] No.45641053[source]▶

>>45640878 #

I see it confirms what I wrote state of art is “not using tessaract anymore” and I think bunch of commercial solutions are stuck with tessaract.

replies(1): >>45641452 #

4. ares623 ◴[20 Oct 25 08:50 UTC] No.45641452{3}[source]▶

>>45641053 #

I assume Tesseract has the advantage of being able to give a confidence score?

5. dragonwriter ◴[20 Oct 25 14:31 UTC] No.45644338[source]▶

>>45640663 (TP) #

Since “commercial OCR software” includes VLM-based commercial offerings, that's clearly not correct.

↑

DeepSeek OCR