DeepSeek OCR

(github.com)

990 points pierre | 5 comments | 20 Oct 25 06:26 UTC | HN request time: 0.019s | source

Show context

pietz ◴[20 Oct 25 08:49 UTC] No.45641449[source]▶

My impression is that OCR is basically solved at this point.

The OmniAI benchmark that's also referenced here wasn't updated with new models since February 2025. I assume that's because general purpose LLMs have gotten better at OCR than their own OCR product.

I've been able to solve a broad range of OCR tasks by simply sending each page as an image to Gemini 2.5 Flash Lite and asking it nicely to extract the content in Markdown under some additional formatting instructions. That will cost you around $0.20 for 1000 pages in batch mode and the results have been great.

I'd be interested to hear where OCR still struggles today.

replies(23): >>45641470 #>>45641479 #>>45641533 #>>45641536 #>>45641612 #>>45641806 #>>45641890 #>>45641904 #>>45642270 #>>45642699 #>>45642756 #>>45643016 #>>45643911 #>>45643964 #>>45644404 #>>45644848 #>>45645032 #>>45645325 #>>45646756 #>>45647189 #>>45647776 #>>45650079 #>>45651460 #

1. peter-m80 ◴[20 Oct 25 08:58 UTC] No.45641536[source]▶

>>45641449 #

No way it's solved. try to make OCR over a magazine with creative layouts. Not possible. I have a collection of vintage computer magazines and from time to time I try to OCR them whith the state of the art mechanisms. All of them requiere a lot of human intervention

replies(3): >>45641544 #>>45641838 #>>45644342 #

2. jmkni ◴[20 Oct 25 08:59 UTC] No.45641544[source]▶

>>45641536 (TP) #

do you have an example of a particularly tricky one?

replies(1): >>45641617 #

3. ekianjo ◴[20 Oct 25 09:06 UTC] No.45641617[source]▶

>>45641544 #

Just try old ads you will see how hard it gets

4. pietz ◴[20 Oct 25 09:31 UTC] No.45641838[source]▶

>>45641536 (TP) #

Could you provide an example that fails? I'm interested in this.

5. constantinum ◴[20 Oct 25 14:31 UTC] No.45644342[source]▶

>>45641536 (TP) #

I use LLMWhisperer[1] for OCR'ing old magazine ads. It preserves the layout and context. Example > https://postimg.cc/ts3vT7kG

https://pg.llmwhisperer.unstract.com/

↑