←back to thread

1303 points serjester | 1 comments | | HN request time: 0s | source
Show context
zoogeny ◴[] No.42954660[source]
Orthogonal to this post, but this just highlights the need for a more machine readable PDF alternative.

I get the inertia of the whole world being on PDF. And perhaps we can just eat the cost and let LLMs suffer the burden going forwards. But why not use that LLM coding brain power to create a better overall format?

I mean, do we really see printing things out onto paper something we need to worry about for the next 100 years? It reminds me of the TTY interface at the heart of Linux. There was a time it all made sense, but can we just deprecate it all now?

replies(1): >>42955191 #
layer8 ◴[] No.42955191[source]
PDF does support incorporating information about the logical document structure, aka Tagged PDF. It’s optional, but recommended for accessibility (e.g. PDF/UA). See chapters 14.7–14.8 in [1]. Processing PDF files as rendered images, as suggested elsewhere in this thread, can actually dramatically lose information present in the PDF.

Alternatively, XML document formats and the like do exist. Indeed, HTML was supposed to be a document format. That’s not the problem. The problem is having people and systems actually author documents in that way in an unambiguous fashion, and having a uniform visual presentation for it that would be durable in the long term (decades at least).

PDF as a format persists because it supports virtually every feature under the sun (if authors care to use them), while largely guaranteeing a precisely defined visual presentation, and being one of the most stable formats.

[1] https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandard...

replies(2): >>42955492 #>>42963020 #