←back to thread

357 points ingve | 1 comments | | HN request time: 0.2s | source
1. TZubiri ◴[] No.43979310[source]
As someone who has worked on this FT. (S&P, parsing of financial disclosures)

The solution is OCR. Don't fuck with internal file format. PDF is designed to print/display stuff, not to be parseable by machines.