←back to thread

DeepSeek OCR

(github.com)
990 points pierre | 8 comments | | HN request time: 0.415s | source | bottom
1. 2big2fail_47 ◴[] No.45642348[source]
I find it interesting that there's all these independent AI-OCR Projects but still no commercial offering. Is it still too inaccurate, too complex or simply too expensive?
replies(7): >>45642449 #>>45642469 #>>45642854 #>>45643901 #>>45644265 #>>45645400 #>>45648665 #
2. Annatar01 ◴[] No.45642449[source]
I dont know, but maybe existing commercial OCR is still on top, and also using ML. Recently tried a free trial for OCR/reading Sütterlin and it was a weird feeling being so outclassed in reading.
3. Eisenstein ◴[] No.45642469[source]
It is because the AI is not actually doing OCR. It is giving an interpretation of what the text in an image is by ingesting vision tokens and mapping them onto text tokens.

So you either have to be fine with a lot of uncertainty as to the accuracy of that interpretation or you have to wait for an LLM that can do it in a completely reproducible way every time.

4. rsolva ◴[] No.45642854[source]
Mistral offers their OCR commercially through their API and in their Chat services, at least.

https://mistral.ai/news/mistral-ocr

5. simlevesque ◴[] No.45643901[source]
https://cloud.google.com/document-ai
6. daemonologist ◴[] No.45644265[source]
There are commercial OCR offerings from the big cloud providers (plus, like, Adobe). In my experience they generally outperform anything open-weights, although there's been a lot of improvement in VLMs in the past year or two.
7. aleinin ◴[] No.45645400[source]
One that I’ve seen recently is https://reducto.ai It appears to be an OCR wrapper.
8. prats226 ◴[] No.45648665[source]
https://docstrange.nanonets.com/ as well, wrapper on top of 7B version of https://huggingface.co/nanonets/Nanonets-OCR2-3B