←back to thread

DeepSeek OCR

(github.com)
990 points pierre | 1 comments | | HN request time: 0s | source
Show context
yoran ◴[] No.45640836[source]
How does an LLM approach to OCR compare to say Azure AI Document Intelligence (https://learn.microsoft.com/en-us/azure/ai-services/document...) or Google's Vision API (https://cloud.google.com/vision?hl=en)?
replies(7): >>45640943 #>>45640992 #>>45642214 #>>45643557 #>>45644126 #>>45647313 #>>45667751 #
ozgune ◴[] No.45640992[source]
OmniAI has a benchmark that companies LLMs to cloud OCR services.

https://getomni.ai/blog/ocr-benchmark (Feb 2025)

Please note that LLMs progressed at a rapid pace since Feb. We see much better results with the Qwen3-VL family, particularly Qwen3-VL-235B-A22B-Instruct for our use-case.

replies(2): >>45642739 #>>45647914 #
1. CaptainOfCoit ◴[] No.45642739[source]
Magistral-Small-2509 is pretty neat as well for its size, has reasoning + multimodality, which helps in some cases where context isn't immediately clear, or there are few missing spots.