One question on the "Automation" score in the results, is this a function of extraction accuracy vs the accuracy of the LLM's "confidence score". I noticed the "accuracy" column was very tightly grouped (between 79 & 84%) but the automation score was way more variable.
And side note: is there an open source Mistral benchmark for their latest OCR model? I know they claimed it was 95% accurate, but it looks that was based on an internal evaluation.