←back to thread

357 points ingve | 1 comments | | HN request time: 0s | source
Show context
svat ◴[] No.43974326[source]
One thing I wish someone would write is something like the browser's developer tools ("inspect elements") for PDF — it would be great to be able to "view source" a PDF's content streams (the BT … ET operators that enclose text, each Tj operator for setting down text in the currently chosen font, etc), to see how every “pixel” of the PDF is being specified/generated. I know this goes against the current trend / state-of-the-art of using vision models to basically “see” the PDF like a human and “read” the text, but it would be really nice to be able to actually understand what a PDF file contains.

There are a few tools that allow inspecting a PDF's contents (https://news.ycombinator.com/item?id=41379101) but they stop at the level of the PDF's objects, so entire content streams are single objects. For example, to use one of the PDFs mentioned in this post, the file https://bfi.uchicago.edu/wp-content/uploads/2022/06/BFI_WP_2... has, corresponding to page number 6 (PDF page 8), a content stream that starts like (some newlines added by me):

    0 g 0 G
    0 g 0 G
    BT
    /F19 10.9091 Tf 88.936 709.041 Td
    [(Subsequen)28(t)-374(to)-373(the)-373(p)-28(erio)-28(d)-373(analyzed)-373(in)-374(our)-373(study)83(,)-383(Bridge's)-373(paren)27(t)-373(compan)28(y)-373(Ne)-1(wGlob)-27(e)-374(reduced)]TJ
    -16.936 -21.922 Td
    [(the)-438(n)28(um)28(b)-28(er)-437(of)-438(priv)56(ate)-438(sc)28(ho)-28(ols)-438(op)-27(erated)-438(b)28(y)-438(Bridge)-437(from)-438(405)-437(to)-438(112,)-464(and)-437(launc)28(hed)-438(a)-437(new)-438(mo)-28(del)]TJ
    0 -21.923 Td
and it would be really cool to be able to see the above “source” and the rendered PDF side-by-side, hover over one to see the corresponding region of the other, etc, the way we can do for a HTML page.
replies(5): >>43974386 #>>43974502 #>>43975665 #>>43979271 #>>43985967 #
dleeftink ◴[] No.43974502[source]
Have a look at this notebook[0], not exactly what you're looking for but does provide a 'live' inspector of the various drawing operations contained in a PDF.

[0]: https://observablehq.com/@player1537/pdf-utilities

replies(1): >>43974717 #
svat ◴[] No.43974717[source]
Thanks, but I was not able to figure out how to get any use out of the notebook above. In what sense is it a 'live' inspector? All it seems to do is to just decompose the PDF into separate “ops” and “args” arrays (neither of which is meaningful without the other), but it does not seem “live” in any sense — how can one find the ops (and args) corresponding to a region of the PDF page, or vice-versa?
replies(1): >>43974972 #
1. dleeftink ◴[] No.43974972[source]
You can load up your own PDF and select a page up front after which it will display the opcodes for this page. Operations are not structurally grouped, but decomposed in three aligned arrays which can be grouped to your liking based on opcode or used as coordinates for intersection queries (e.g. combining the ops and args arrays).

The 'liveness' here is that you can derive multiple downstream cells (e.g. filters, groupings, drawing instructions) from the initial parsed PDF, which will update as you swap out the PDF file.