HTML as an Accessible Format for Papers (2023)

(info.arxiv.org)

262 points el3ctron | 2 comments | 06 Dec 25 14:59 UTC | HN request time: 0.001s | source

Show context

RandyOrion ◴[07 Dec 25 03:05 UTC] No.46178803[source]▶

For arXiv papers, I prefer HTML format much more than PDF format.

Compared to PDF format, HTML format is much more accessible because of browsers. Basically I can reuse my browser extensions to do anything I like without hassle, like translation, note taking, sending texts to LLMs, and so on.

For now, arXiv offers two HTML services: the default one in https://arxiv.org/html/xxxx.xxxxx , and the alternative one in https://ar5iv.labs.arxiv.org/html/xxxx.xxxxx , here 'x' is a placeholder for a number or digit.

The most glaring problem of the default HTML service is the coverage of papers. Sometimes it just doesn't work, e.g., https://arxiv.org/html/2505.06708 . The solution may be switch to alternative HTML service, e.g., https://ar5iv.labs.arxiv.org/html/2505.06708 .

Note that alternative HTML service also has coverage problem. Sometimes both HTML services fail, e.g. https://arxiv.org/abs/2511.22625 .

replies(1): >>46185930 #

1. rhubarbtree ◴[07 Dec 25 22:28 UTC] No.46185930[source]▶

>>46178803 #

Serious question: do websites from the 90s work well in modern browsers? Because PDFs from that time view fine.

replies(1): >>46206172 #

2. cxr ◴[09 Dec 25 15:45 UTC] No.46206172[source]▶

>>46185930 (TP) #

Aside from sites that used non-standard stuff like ActiveX or Java applets, the general answer is "yes".

And to respond to your implied criticism: the stability/reliability/fidelity of PDFs is a myth. It would be hard to say how many dozens of PDFs I've come across in the last two years that don't look the same across devices/viewers (or sometimes just fail to render in their entirety). This played a significant part in a cascade of errors in one incident I know of that resulted in the payout of a claim more than $1,000 but less than $10,000—not to mention a lot of strife and anger for the persons involved over the course of multiple months before resolution.

(As I write this now, I realize I'd almost forgotten about the fact that almost every time I've taken something to FedEx or UPS to be printed at a self-service kiosk, the result has been unusable, so I've had to take it to the clerk to have them print it instead.)

HTML at least has the property that it's still trivial to access and extract the data if you run into either malformed inputs or ones that are valid but incompatible/unsupported by whatever viewer (browser) you happen to be using, which is a lot more than you can say for more opaque formats like Java, PDF, and Flash.

↑