←back to thread

168 points Tammilore | 3 comments | | HN request time: 1.046s | source

Documind is an open-source tool that turns documents into structured data using AI.

What it does:

- Extracts specific data from PDFs based on your custom schema - Returns clean, structured JSON that's ready to use - Works with just a PDF link + your schema definition

Just run npm install documind to get started.

1. asjfkdlf ◴[] No.42173400[source]
I am looking for a similar service that turns any document (PNG, PDf, DocX) into JSON (preserving the field relationships). I tried with ChatGPT, but hallucinations are common. Does anything exist?
replies(2): >>42173587 #>>42173893 #
2. omk ◴[] No.42173587[source]
This is also using OpenAI's GPT model. So the same hallucinations are probable here for PDFs.
3. cccybernetic ◴[] No.42173893[source]
I built a drag-and-drop document converter that extracts text into custom columns (for CSV) or keys (for JSON). You can schedule it to run at certain times and update a database as well.

I haven't had issues with hallucinations. If you're interested, my email is in my bio.