PDF does support incorporating information about the logical document structure, aka Tagged PDF. It’s optional, but recommended for accessibility (e.g. PDF/UA). See chapters 14.7–14.8 in [1]. Processing PDF files as rendered images, as suggested elsewhere in this thread, can actually dramatically lose information present in the PDF.
Alternatively, XML document formats and the like do exist. Indeed, HTML was supposed to be a document format. That’s not the problem. The problem is having people and systems actually author documents in that way in an unambiguous fashion, and having a uniform visual presentation for it that would be durable in the long term (decades at least).
PDF as a format persists because it supports virtually every feature under the sun (if authors care to use them), while largely guaranteeing a precisely defined visual presentation, and being one of the most stable formats.
[1] https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandard...