←back to thread

169 points Tammilore | 2 comments | | HN request time: 0.401s | source

Documind is an open-source tool that turns documents into structured data using AI.

What it does:

- Extracts specific data from PDFs based on your custom schema - Returns clean, structured JSON that's ready to use - Works with just a PDF link + your schema definition

Just run npm install documind to get started.

1. thor-rodrigues ◴[] No.42172239[source]
Very nice tool! Just last week, I was working on extracting information from PDFs for an automation flow I’m building. I used Unstructured (https://unstructured.io/), which supports multiple file types, not just PDFs.

However, my main issue is that I need to work with confidential client data that cannot be uploaded to a third party. Setting up the open-source, locally hosted version of Unstructured was quite cumbersome due to the numerous additional packages and installation steps required.

While I’m open to the idea of parsing content with an LLM that has vision capabilities, data safety and confidentiality are critical for many applications. I think your project would go from good to great if it would be possible to connect to Ollama and run locally,

That said, this is an excellent application! I can definitely see myself using it in other projects that don’t demand such stringent data confidentiality.”

replies(1): >>42172303 #
2. Tammilore ◴[] No.42172303[source]
Thank you, I appreciate the feedback! I understand people wanting data confidentiality and I'm considering connecting Ollama for future updates!