←back to thread

169 points Tammilore | 2 comments | | HN request time: 0.402s | source

Documind is an open-source tool that turns documents into structured data using AI.

What it does:

- Extracts specific data from PDFs based on your custom schema - Returns clean, structured JSON that's ready to use - Works with just a PDF link + your schema definition

Just run npm install documind to get started.

Show context
vr46 ◴[] No.42175881[source]
I’ll have to test this against my local Python pipeline which does all this without an LLM in attendance. There are a ton of existing Python libraries which have been doing this for a long time, so let’s take a look..
replies(1): >>42176786 #
1. thegabriele ◴[] No.42176786[source]
Care to share the best ones for some use cases? Thanks
replies(1): >>42177301 #
2. vr46 ◴[] No.42177301[source]
MinerU

PDFQuery

PyMuPDF (having more success with older versions, right now)