←back to thread

DeepSeek OCR

(github.com)
990 points pierre | 2 comments | | HN request time: 0.415s | source
1. foofoo12 ◴[] No.45643117[source]
How does it compare to Tesseract? https://github.com/tesseract-ocr/tesseract

I use ocrmypdf (which uses Tesseract). Runs locally and is absolutely fantastic. https://ocrmypdf.readthedocs.io/en/latest/

replies(1): >>45644101 #
2. utopiah ◴[] No.45644101[source]
Indeed, seems the default benchmark is LLM/VLM based alternatives as if they somehow "solved" the problem but IMHO even if it goes from (totally made up numbers) 80% with tesseract to 95% with this or Qwen or whatever but it takes 100x harddisk with containers or a CUDA stack, dedicated hardware, e.g. GPU with 16GB or VRAM, etc then it's such a trade of it should be considered.