←back to thread

168 points Tammilore | 8 comments | | HN request time: 1.013s | source | bottom

Documind is an open-source tool that turns documents into structured data using AI.

What it does:

- Extracts specific data from PDFs based on your custom schema - Returns clean, structured JSON that's ready to use - Works with just a PDF link + your schema definition

Just run npm install documind to get started.

1. inexcf ◴[] No.42171899[source]
Got excited about an open-source tool doing this.

Alas, i am let down. It is an open-source tool creating the prompt for the OpenAI API and i can't go and send customer data to them.

I'm aware of https://github.com/clovaai/donut so i hoped this would be more like that.

replies(5): >>42171944 #>>42171963 #>>42172184 #>>42172234 #>>42195901 #
2. _joel ◴[] No.42171944[source]
You can self host OpenAPI compatible models with lmstudio and the like. I've used it with https://anythingllm.com/
3. turblety ◴[] No.42171963[source]
You might be able to use Ollama, which has a OpenAI compatible API.
replies(1): >>42172054 #
4. Zambyte ◴[] No.42172054[source]
Not without chaning the code (should be easy though)

https://github.com/DocumindHQ/documind/blob/d91121739df03867...

5. Tammilore ◴[] No.42172184[source]
Hi. I totally get the concern about sending data to OpenAI. Right now, Documind uses OpenAI's API just so people could quickly get started and see what it is like, but I’m open to adding options and contributions that would be better for privacy.
replies(1): >>42188570 #
6. ◴[] No.42172234[source]
7. inexcf ◴[] No.42188570[source]
That sounds great.
8. sidmo ◴[] No.42195901[source]
I'd recommend checking out vision language models. They generate embeddings of the images themselves (as a collection of patches) and you can see query matching displayed as a heatmap over the document. Picks up text that OCR misses. I built a simple API over it if you want to try it out: https://github.com/DataFog/vlm-api