(github.com)

990 points pierre | 2 comments | 20 Oct 25 06:26 UTC | HN request time: 0s | source

Show context

farseer ◴[20 Oct 25 06:44 UTC] No.45640648[source]▶

>>45640594 (OP) #

How good is this compared to most commercial OCR software?

replies(1): >>45640663 #

ozim ◴[20 Oct 25 06:47 UTC] No.45640663[source]▶

>>45640648 #

Any vision model is better than commercial OCR software.

replies(3): >>45640798 #>>45640878 #>>45644338 #

Etheryte ◴[20 Oct 25 07:31 UTC] No.45640878[source]▶

>>45640663 #

I'm not really sure if that's an accurate summary of the state of the art, [0] is a better overview. In short, SOTA multi-modal LLMs are the best option for handwriting, nearly anything is good at printed text, for printed media, specialty models from hyperscalers are slightly better than multi-modal LLMs.

[0] https://research.aimultiple.com/ocr-accuracy/

replies(1): >>45641053 #

1. ozim ◴[20 Oct 25 08:00 UTC] No.45641053[source]▶

>>45640878 #

I see it confirms what I wrote state of art is “not using tessaract anymore” and I think bunch of commercial solutions are stuck with tessaract.

replies(1): >>45641452 #

2. ares623 ◴[20 Oct 25 08:50 UTC] No.45641452[source]▶

>>45641053 (TP) #

I assume Tesseract has the advantage of being able to give a confidence score?

↑

DeepSeek OCR