How does an LLM approach to OCR compare to say Azure AI Document Intelligence (
https://learn.microsoft.com/en-us/azure/ai-services/document...) or Google's Vision API (
https://cloud.google.com/vision?hl=en)?
Classical OCR still probably make undesirable su6stıtutìons in CJK from there being far too many of similar ones, even some absurd ones that are only distinguishable under microscope or by looking at binary representations. LLMs are better constrained to valid sequences of characters, and so they would be more accurate.
Or at least that kind of thing would motivate them to re-implement OCR with LLM.
Huh... Would it work to have some kind of error checking model that corrected common OCR errors? That seems like it should be relatively easy.
It's harder then it first seems. The root problem is that for text like "hallo", correcting to "hello" may be fixing an error or introducing an error. In general, the more aggressive your error correction, the more errors you inadvertently introduce. You can try and make a judgement based on context ("hallo, how are you?"), which certainly helps, but it's only a mitigation. Light error correction is common and effective, but you can't push it to a full solution. The only way to fully solve this problem is to look at the entire document at once so you have maximum context available, and this is what non-traditional OCR attempts to do.