←back to thread

357 points ingve | 1 comments | | HN request time: 0.205s | source
Show context
dwheeler ◴[] No.43974621[source]
The better solution is to embed, in the PDF, the editable source document. This is easily done by LibreOffice. Embedding it takes very little space in general (because it compresses well), and then you have MUCH better information on what the text is and its meaning. It works just fine with existing PDF readers.
replies(5): >>43974667 #>>43974983 #>>43975217 #>>43975401 #>>43976216 #
lelandfe ◴[] No.43975401[source]
The better solution to a search engine extracting text from existing PDFs is to provide advice on how to author PDFs?

What's the timeline for this solution to pay off

replies(1): >>43976378 #
chaps ◴[] No.43976378[source]
Microsoft is one of the bigger contributors to this. Like -- why does excel have a feature to export to PDF, but not a feature to do the opposite? That export functionality really feels like it was given to a summer intern who finished it in two weeks and never had to deal with it ever again.
replies(2): >>43978047 #>>43980433 #
mattigames ◴[] No.43978047[source]
Because then we would have 2 formats: "pdfs generated by Excel" and "real pdfs" with the same extension and that would be it's own can of worms for Microsoft's and for everyone else.
replies(1): >>43986958 #
1. chaps ◴[] No.43986958[source]
Hah, no. We would be going from 200,000 formats to 200,001 formats. Begone, shallow xkcd references!