Building an app that extracts key information from PDFs + highlights citations.
You provide a PDF and a JSON schema defining what to extract, and it returns the extracted values, the citations and their precise locations in the document.
This is especially valuable in workflows where verification of LLM extracted information is critical (e.g. legal and finance). It can handle complex layouts like multiple columns, tables and also scanned documents.
Planning to offer this both as an API and a self-hosted option for organizations with strict data privacy requirements.
Screenshot: https://superdocs.io/highlight.png