DeepSeek OCR

(github.com)

990 points pierre | 3 comments | 20 Oct 25 06:26 UTC | HN request time: 0.019s | source

Show context

yoran ◴[20 Oct 25 07:22 UTC] No.45640836[source]▶

How does an LLM approach to OCR compare to say Azure AI Document Intelligence (https://learn.microsoft.com/en-us/azure/ai-services/document...) or Google's Vision API (https://cloud.google.com/vision?hl=en)?

replies(7): >>45640943 #>>45640992 #>>45642214 #>>45643557 #>>45644126 #>>45647313 #>>45667751 #

numpad0 ◴[20 Oct 25 10:17 UTC] No.45642214[source]▶

>>45640836 #

Classical OCR still probably make undesirable su6stıtutìons in CJK from there being far too many of similar ones, even some absurd ones that are only distinguishable under microscope or by looking at binary representations. LLMs are better constrained to valid sequences of characters, and so they would be more accurate.

Or at least that kind of thing would motivate them to re-implement OCR with LLM.

replies(1): >>45644008 #

1. fluoridation ◴[20 Oct 25 13:57 UTC] No.45644008[source]▶

>>45642214 #

Huh... Would it work to have some kind of error checking model that corrected common OCR errors? That seems like it should be relatively easy.

replies(1): >>45646514 #

2. colonCapitalDee ◴[20 Oct 25 17:21 UTC] No.45646514[source]▶

>>45644008 (TP) #

It's harder then it first seems. The root problem is that for text like "hallo", correcting to "hello" may be fixing an error or introducing an error. In general, the more aggressive your error correction, the more errors you inadvertently introduce. You can try and make a judgement based on context ("hallo, how are you?"), which certainly helps, but it's only a mitigation. Light error correction is common and effective, but you can't push it to a full solution. The only way to fully solve this problem is to look at the entire document at once so you have maximum context available, and this is what non-traditional OCR attempts to do.

replies(1): >>45646597 #

3. fluoridation ◴[20 Oct 25 17:28 UTC] No.45646597[source]▶

>>45646514 #

Okay, but there way more common errors that should be easy to fix. "He11o", "Emest Herningway", incorrect diacritics like the other person mentioned, etc.

↑