←back to thread

357 points ingve | 4 comments | | HN request time: 0s | source
Show context
dwheeler ◴[] No.43974621[source]
The better solution is to embed, in the PDF, the editable source document. This is easily done by LibreOffice. Embedding it takes very little space in general (because it compresses well), and then you have MUCH better information on what the text is and its meaning. It works just fine with existing PDF readers.
replies(5): >>43974667 #>>43974983 #>>43975217 #>>43975401 #>>43976216 #
kerkeslager ◴[] No.43974983[source]
That's true, but it's dependent on the creator of the PDF having aligned incentives with the consumer of the PDF.

In the e-Discovery field, it's commonplace for those providing evidence to dump it into a PDF purely so that it's harder for the opposing side's lawyers to consume. If both sides have lots of money this isn't a barrier, but for example public defenders don't have funds to hire someone (me!) to process the PDFs into a readable format, so realistically they end up taking much longer to process the data, which takes a psychological toll on the defendant. And that's if they process the data at all.

The solution is to make it illegal to do this: wiretap data, for example, should be provided in a standardized machine-readable format. There's no ethical reason for simple technical friction to be affecting the outcomes of criminal proceedings.

replies(2): >>43975362 #>>43979486 #
1. lurk2 ◴[] No.43979486[source]
> The solution is to make it illegal to do this: wiretap data, for example, should be provided in a standardized machine-readable format. There's no ethical reason for simple technical friction to be affecting the outcomes of criminal proceedings.

I can’t speak to wiretaps specifically, but when it comes to the legal field, this is usually already how it operates. GDPR, for example, makes specific provisions that user data must be provided in an accessible, machine-readable format. Most jurisdictions also aren’t going to look kindly on physical document dumping and will require that documents be provided in a machine-readable format. PDF is the legal industry standard for all outbound files. The consistency of its formatting makes up for the difficulties involved with machine-readability.

There’s not a huge incentive to find an alternative because most firms will just charge a markup on the time a clerk spends reading through and transcribing those PDFs. If cost is a concern, though, most jurisdictions will require the party in possession of the original documents to provide them in a machine-readable format (e.g. providing bank records as Excel spreadsheets rather than as PDFs).

replies(1): >>43979952 #
2. kerkeslager ◴[] No.43979952[source]
I'm not sure I understand what you're saying? PDF isn't a machine-readable format for most kinds of data and keeping inherent court costs down is always a concern because it keeps the courts fair to the poor.
replies(1): >>43998316 #
3. lurk2 ◴[] No.43998316[source]
I’m saying that most jurisdictions likely already do require data to be machine-readable, but when you run into PDFs, it isn’t a document dump (which courts don’t look kindly upon), but is instead a product of mixed parts convention and motivated laziness.
replies(1): >>44001971 #
4. kerkeslager ◴[] No.44001971{3}[source]
You're saying two mutually exclusive things. Either it's required to be machine readable or it's PDF: it can't be both.