The solution is OCR. Don't fuck with internal file format. PDF is designed to print/display stuff, not to be parseable by machines.