I think very soon a new model will destroy whatever startups and services are built around document ingestion. As in a model that can take in a pdf page as a image and transcribe it to text with near perfect accuracy.
I think the Azure Document Intelligence, Google Document AI and Amazon Textract are among the best if not the best services though and they offer these models.
Extracting plain text isn’t that much of a problem, relatively speaking. It’s interpreting more complex elements like nested lists, tables, side bars, footnotes/endnotes, cross-references, images and diagrams where things get challenging.
I have not tested Azure Document Intelligence, Google Document AI, but AWS Textract, LLamaparse, Unstructured and Omni made to my shortlist. I have not tested Docling, as I could not install it on my Windows laptop.