I’ll have to test this against my local Python pipeline which does all this without an LLM in attendance. There are a ton of existing Python libraries which have been doing this for a long time, so let’s take a look..
replies(1):
What it does:
- Extracts specific data from PDFs based on your custom schema - Returns clean, structured JSON that's ready to use - Works with just a PDF link + your schema definition
Just run npm install documind to get started.