←back to thread

262 points el3ctron | 2 comments | | HN request time: 0.001s | source
Show context
RandyOrion ◴[] No.46178803[source]
For arXiv papers, I prefer HTML format much more than PDF format.

Compared to PDF format, HTML format is much more accessible because of browsers. Basically I can reuse my browser extensions to do anything I like without hassle, like translation, note taking, sending texts to LLMs, and so on.

For now, arXiv offers two HTML services: the default one in https://arxiv.org/html/xxxx.xxxxx , and the alternative one in https://ar5iv.labs.arxiv.org/html/xxxx.xxxxx , here 'x' is a placeholder for a number or digit.

The most glaring problem of the default HTML service is the coverage of papers. Sometimes it just doesn't work, e.g., https://arxiv.org/html/2505.06708 . The solution may be switch to alternative HTML service, e.g., https://ar5iv.labs.arxiv.org/html/2505.06708 .

Note that alternative HTML service also has coverage problem. Sometimes both HTML services fail, e.g. https://arxiv.org/abs/2511.22625 .

replies(1): >>46185930 #
1. rhubarbtree ◴[] No.46185930[source]
Serious question: do websites from the 90s work well in modern browsers? Because PDFs from that time view fine.
replies(1): >>46206172 #
2. cxr ◴[] No.46206172[source]
Aside from sites that used non-standard stuff like ActiveX or Java applets, the general answer is "yes".

And to respond to your implied criticism: the stability/reliability/fidelity of PDFs is a myth. It would be hard to say how many dozens of PDFs I've come across in the last two years that don't look the same across devices/viewers (or sometimes just fail to render in their entirety). This played a significant part in a cascade of errors in one incident I know of that resulted in the payout of a claim more than $1,000 but less than $10,000—not to mention a lot of strife and anger for the persons involved over the course of multiple months before resolution.

(As I write this now, I realize I'd almost forgotten about the fact that almost every time I've taken something to FedEx or UPS to be printed at a self-service kiosk, the result has been unusable, so I've had to take it to the clerk to have them print it instead.)

HTML at least has the property that it's still trivial to access and extract the data if you run into either malformed inputs or ones that are valid but incompatible/unsupported by whatever viewer (browser) you happen to be using, which is a lot more than you can say for more opaque formats like Java, PDF, and Flash.