Llama-OCR: Document to Markdown

1. philips ◴[16 Nov 24 07:34 UTC] No.42155081[source]▶

I have recently used llama3.2-vision to handle some paper bidsheets for a charity auction and it is fairly accurate with some terrible handwriting. I hope to use it for my event next year.

I do find it rather annoying not being able to get it to consistently output a CSV though. ChatGPT and Gemini seem better at doing that but I haven’t tried to automate it.

The scale of my problem is about 100 pages of bidsheets and so some manual cleaning is ok. It is certainly better than burning volunteers time.

https://github.com/philips/paper-bidsheets

replies(2): >>42155583 #>>42164693 #

2. mosselman ◴[16 Nov 24 09:59 UTC] No.42155583[source]▶

>>42155081 (TP) #

What about using llama3.2-vision to do the OCR bit and then deferring to ChatGPT to do the CSV part?

3. wriggler ◴[17 Nov 24 15:33 UTC] No.42164693[source]▶

>>42155081 (TP) #

I'd love to hear how Handwriting OCR (https://www.handwritingocr.com) compares for your task.

It's not free, but its accuracy for for handwritten documents is the best out there (I am the founder, so am biased, but I'm really excited about where the accuracy is now). It could save you time and for your 100 page project would cost only $12.

replies(1): >>42169902 #

4. KetoManx64 ◴[18 Nov 24 05:16 UTC] No.42169902[source]▶

>>42164693 #

My main qualm with a project like yours is that I have to upload my documents to a third party and trust them with that data. I have a couple thousand pages worth of journal entries from the last decade and I would never upload those to a website to get OCR'd, but with a local Ollama model I have full control of the data and it all stays local.

replies(1): >>42189859 #

5. wriggler ◴[20 Nov 24 01:12 UTC] No.42189859{3}[source]▶

>>42169902 #

I understand your concern, and it's a common one. However, we can only give assurances in our privacy policy that your data is used only to perform the OCR, and nothing else. You can delete all data from the server immediately after downloading your results and no trace will be left.

Of course a local solution like Ollama is preferable for privacy reasons but, for now, the OCR performance of available local models is just not very good, especially from handwritten documents. With a couple thousand pages of journal entries, that means a lot of post-processing and editing.