←back to thread

168 points Tammilore | 1 comments | | HN request time: 0.209s | source

Documind is an open-source tool that turns documents into structured data using AI.

What it does:

- Extracts specific data from PDFs based on your custom schema - Returns clean, structured JSON that's ready to use - Works with just a PDF link + your schema definition

Just run npm install documind to get started.

1. constantinum ◴[] No.42174779[source]
Reading from the comments, some of the common questions regarding document extraction are:

* Run locally or on premise for security/privacy reasons

* Support multiple LLMs and vector DBs - plug and play

* Support customisable schemas

* Method to check/confirm accuracy with source

* Cron jobs for automation

There is Unstract that solves the above requirements.

https://github.com/Zipstack/unstract