Maybe it's time for new document formats and browsers that neatly separate content, presentation and UI layers? PDF and HTML are 20+ years old and it's often difficult to extract information from either let alone author a browser.
replies(1):
Also unlike PDF, I've never seen it actually used in the wild.