I tried this out on huggingface, and it has the same issue as every other multimodal AI OCR option (including MinerU, olmOCR, Gemini, ChatGPT, ...). It ignores pictures, charts, and other visual elements in a document, even though the models are pretty good at describing images and charts by themselves. What this means is that you can't use these tools yet to create fully accessible alternatives to PDFs.
replies(1):