←back to thread

DeepSeek OCR

(github.com)
990 points pierre | 4 comments | | HN request time: 0.522s | source
Show context
pietz ◴[] No.45641449[source]
My impression is that OCR is basically solved at this point.

The OmniAI benchmark that's also referenced here wasn't updated with new models since February 2025. I assume that's because general purpose LLMs have gotten better at OCR than their own OCR product.

I've been able to solve a broad range of OCR tasks by simply sending each page as an image to Gemini 2.5 Flash Lite and asking it nicely to extract the content in Markdown under some additional formatting instructions. That will cost you around $0.20 for 1000 pages in batch mode and the results have been great.

I'd be interested to hear where OCR still struggles today.

replies(23): >>45641470 #>>45641479 #>>45641533 #>>45641536 #>>45641612 #>>45641806 #>>45641890 #>>45641904 #>>45642270 #>>45642699 #>>45642756 #>>45643016 #>>45643911 #>>45643964 #>>45644404 #>>45644848 #>>45645032 #>>45645325 #>>45646756 #>>45647189 #>>45647776 #>>45650079 #>>45651460 #
raincole ◴[] No.45641533[source]
If you can accept that the machine just make up what it doesn't recognize instead of saying "I don't know," then yes it's solved.

(I'm not being snarky. It's acceptable in some cases.)

replies(4): >>45641608 #>>45642140 #>>45643829 #>>45645028 #
jakewins ◴[] No.45641608[source]
But this was very much the case with existing OCR software as well? I guess the LLMs will end up making up plausible looking text instead of text riddled with errors, which makes it much harder to catch the mistakes, in fairness
replies(2): >>45642440 #>>45643820 #
1. wahnfrieden ◴[] No.45643820[source]
Existing ocr doesn’t skip over entire (legible) paragraphs or hallucinate entire sentences
replies(3): >>45643920 #>>45644305 #>>45645395 #
2. Davidzheng ◴[] No.45643920[source]
rarely happens to me using LLMs to transcribe pdfs
3. criddell ◴[] No.45644305[source]
I usually run the image(s) through more than one converter then compare the results. They all have problems, but the parts they agree on are usually correct.
4. KoolKat23 ◴[] No.45645395[source]
This must be some older/smaller model.