←back to thread

DeepSeek OCR

(github.com)
990 points pierre | 3 comments | | HN request time: 0.001s | source
Show context
breadislove ◴[] No.45643006[source]
For everyone wondering how good this and other benchmarks are:

- the OmniAI benchmark is bad

- Instead check OmniDocBench[1] out

- Mistral OCR is far far behind most Open Source OCR models and even further behind then Gemini

- End to End OCR is still extremely tricky

- composed pipelines work better (layout detection -> reading order -> OCR every element)

- complex table parsing is still extremely difficult

[1]: https://github.com/opendatalab/OmniDocBench

replies(2): >>45643626 #>>45647948 #
hakunin ◴[] No.45643626[source]
Wish someone benchmarked Apple Vision Framework against these others. It's built into most Apple devices, but people don't know you can actually harness it to do fast, good quality OCR for you (and go a few extra steps to produce searchable pdfs, which is my typical use case). I'm very curious where it would fall in the benchmarks.
replies(3): >>45643785 #>>45643798 #>>45645485 #
CaptainOfCoit ◴[] No.45643798[source]
Yeah, if it was cross-platform maybe more people would be curious about it, but something that can only run on ~10% of the hardware people have doesn't make it very attractive to even begin to spend time on Apple-exclusive stuff.
replies(2): >>45644313 #>>45644771 #
ch1234 ◴[] No.45644313[source]
But you can have an apple device deployed in your stack to handle the OCR, right? I get on-device is a hardware limitation for many, but if you have an apple device in your stack, can’t you leverage this?
replies(1): >>45645344 #
CaptainOfCoit ◴[] No.45645344[source]
Yeah, but handling macOS is a infrastructure-capacity sucks, Apple really doesn't want you to so tooling is almost none existing. I've setup CI/CD stacks before that needed macOS builders and it's always the most cumbersome machines to manage as infrastructure.
replies(1): >>45645876 #
coder543 ◴[] No.45645876[source]
AWS literally lets you deploy Macs as EC2 instances, which I believe includes all of AWS's usual EBS storage and disk imaging features.
replies(1): >>45646292 #
CaptainOfCoit ◴[] No.45646292{3}[source]
Alright, so now the easy thing is done, now how do you actually manage them, keep them running and do introspection without resorting to SSH or even remote desktop?
replies(1): >>45646313 #
1. coder543 ◴[] No.45646313{4}[source]
How do you manage any EC2 instance “without resorting to SSH”? Even for Linux EC2 instances, the right answer is often tools like Ansible, which do still use SSH under the hood.
replies(1): >>45647232 #
2. CaptainOfCoit ◴[] No.45647232[source]
You usually provision them via images, that they then either install from or boot from directly. Not to mention there are countless of infrastructure software to run that works for at least Linux, sometimes Windows and seldom even macOS.
replies(1): >>45647288 #
3. coder543 ◴[] No.45647288[source]
I specifically mentioned the imaging capability of EBS for Mac, which you dismissed as the easy part. Now you’re claiming that is the main thing? Well, good news!

And yes, Ansible (among other tools) can be used to manage macOS.

This discussion doesn’t seem productive. You have a preconceived view point, and you’re not actually considering the problem or even doing 5 seconds of googling.

Managing a Mac fleet on AWS isn’t a real problem. If Apple’s OCR framework were significantly above the competition, it could easily be used. I would like to see benchmarks of it, as the other person was also asking for.