←back to thread

357 points ingve | 1 comments | | HN request time: 0.209s | source
1. hilbert42 ◴[] No.43984358[source]
"It doesn’t have text in the way you might think of it, but more of a mapping of glyphs to coordinates on “paper”."

I've often had trouble extracting text from PDFs, it's time consuming and messy, so a quick question.

The PDF format works pretty well for what it does but it's now pretty ancient, so does anyone know if there's any newer format on the horizon that could be a next-generation replacement that would make it much easier to extract its data and export it to another format (say, docx, odt, etc.)?