Is it possible to do this locally with open source software? I have a lot of accounting PDFs to convert but due to privacy concerns it should not run in the cloud.
replies(4):
A cheap hack is to push the documents through pdftotext from Poppler and if nothing or very little comes out, push them through OCRMyPDF and pipe it to pdftotext. If it's scanned you probably want some flags for deskewing and so on.
To make a bulk load of PDF mostly greppable it's a decent technique, to get every 0 as a 0 you're probably going to proofread every conversion.