DeepSeek OCR

(github.com)

990 points pierre | 1 comments | 20 Oct 25 06:26 UTC | HN request time: 0s | source

Show context

pietz ◴[20 Oct 25 08:49 UTC] No.45641449[source]▶

My impression is that OCR is basically solved at this point.

The OmniAI benchmark that's also referenced here wasn't updated with new models since February 2025. I assume that's because general purpose LLMs have gotten better at OCR than their own OCR product.

I've been able to solve a broad range of OCR tasks by simply sending each page as an image to Gemini 2.5 Flash Lite and asking it nicely to extract the content in Markdown under some additional formatting instructions. That will cost you around $0.20 for 1000 pages in batch mode and the results have been great.

I'd be interested to hear where OCR still struggles today.

replies(23): >>45641470 #>>45641479 #>>45641533 #>>45641536 #>>45641612 #>>45641806 #>>45641890 #>>45641904 #>>45642270 #>>45642699 #>>45642756 #>>45643016 #>>45643911 #>>45643964 #>>45644404 #>>45644848 #>>45645032 #>>45645325 #>>45646756 #>>45647189 #>>45647776 #>>45650079 #>>45651460 #

carschno ◴[20 Oct 25 08:52 UTC] No.45641479[source]▶

>>45641449 #

Technically not OCR, but HTR (hand-written text/transcript recognition) is still difficult. LLMs have increased accuracy, but their mistakes are very hard to identify because they just 'hallucinate' text they cannot digitize.

replies(3): >>45641563 #>>45641605 #>>45641795 #

sramam ◴[20 Oct 25 09:01 UTC] No.45641563[source]▶

>>45641479 #

Interesting - have you tried sending the image and 'hallucinated' text together to a review LLM to fix mistakes?

I don't have a use case of 100s or 1000s of hand-written notes have to be transcribed. I have only done this with whiteboard discussion snapshots and it has worked really well.

replies(1): >>45642404 #

1. lazide ◴[20 Oct 25 10:45 UTC] No.45642404[source]▶

>>45641563 #

Often, the review LLM will also say everything is okay when it’s made up too.

↑