/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
PDF to Text, a challenging problem
(www.marginalia.nu)
357 points
ingve
| 1 comments |
13 May 25 15:01 UTC
|
HN request time: 0.208s
|
source
1.
patrick41638265
◴[
13 May 25 20:15 UTC
]
No.
43977197
[source]
▶
>>43973721 (OP)
#
Good old
https://linux.die.net/man/1/pdftotext
and a little Python on top of its output will get you a long way if your documents are not too crazy. I use it to parse all my bank statements into an sqlite database for analysis.
ID:
GO
↑