(llamaocr.com)

293 points lapnect | 1 comments | 16 Nov 24 04:57 UTC | HN request time: 0.271s | source

Show context

nutlope ◴[16 Nov 24 07:16 UTC] No.42155007[source]▶

Hi all, I'm the author of llama-ocr. Thank you for sharing & for the kind comments! I built this earlier this week since I wanted a simple API to do OCR – it uses llama 3.2 vision (hosted on together.ai, where i work) to parse images into structured markdown. I also have it available as an npm package.

Planning to add a bunch of other features like the ability to parse PDFs, output a response in JSON, ect... If anyone has any questions, feel free to send them and I'll try to respond!

replies(5): >>42155235 #>>42155376 #>>42155942 #>>42158372 #>>42159434 #

Curiositry ◴[16 Nov 24 08:20 UTC] No.42155235[source]▶

>>42155007 #

Option to use a local LLM?

replies(1): >>42155548 #

Eisenstein ◴[16 Nov 24 09:47 UTC] No.42155548[source]▶

>>42155235 #

I made a script which does exactly the same thing but locally using koboldcpp for inference. It downloads MiniCPM-V 2.6 with image projector the first time you run it. If you want to use a different model you can, but you will want to edit the instruct template to match.

* https://github.com/jabberjabberjabber/LLMOCR

replies(1): >>42155615 #

nirav72 ◴[16 Nov 24 10:08 UTC] No.42155615[source]▶

>>42155548 #

MiniCPM-v 2.6 is probably the best self-hosted vision model I have used so far. Not just for OCR, but also image analysis. I have it setup, so my NVR (frigate) sends couple of images upon motion alert from a driveway security camera to Ollama with minicpm-v 2.6. I’m able to get a reasonably accurate description of the vehicle that pulled into the driveway. Including describing the person that exits the vehicle and also the license plate. All sent to my phone.

Llama-OCR: Document to Markdown