(github.com)

990 points pierre | 3 comments | 20 Oct 25 06:26 UTC | HN request time: 0s | source

Show context

breadislove ◴[20 Oct 25 12:13 UTC] No.45643006[source]▶

>>45640594 (OP) #

For everyone wondering how good this and other benchmarks are:

- the OmniAI benchmark is bad

- Instead check OmniDocBench[1] out

- Mistral OCR is far far behind most Open Source OCR models and even further behind then Gemini

- End to End OCR is still extremely tricky

- composed pipelines work better (layout detection -> reading order -> OCR every element)

- complex table parsing is still extremely difficult

[1]: https://github.com/opendatalab/OmniDocBench

replies(2): >>45643626 #>>45647948 #

hakunin ◴[20 Oct 25 13:24 UTC] No.45643626[source]▶

>>45643006 #

Wish someone benchmarked Apple Vision Framework against these others. It's built into most Apple devices, but people don't know you can actually harness it to do fast, good quality OCR for you (and go a few extra steps to produce searchable pdfs, which is my typical use case). I'm very curious where it would fall in the benchmarks.

replies(3): >>45643785 #>>45643798 #>>45645485 #

1. graeme ◴[20 Oct 25 16:05 UTC] No.45645485[source]▶

>>45643626 #

Interesting. How do you harness it for that purpose? I've found apple ocr to be very good.

replies(2): >>45645618 #>>45653471 #

2. hakunin ◴[20 Oct 25 16:16 UTC] No.45645618[source]▶

>>45645485 (TP) #

The short answer is a tool like OwlOCR (which also has CLI support). The long answer is that there are tools on github (I created the stars list: https://github.com/stars/maxim/lists/apple-vision-framework/) that try to use the framework for various things. I’m also trying to build an ffi-based Ruby gem that provides convenient access in Ruby to the framework’s functionality.

3. ah27182 ◴[21 Oct 25 07:49 UTC] No.45653471[source]▶

>>45645485 (TP) #

Apple shortcuts allows you to use OCR on images you pass into it. Looking for “ Extract Text from Image”

↑

DeepSeek OCR