←back to thread

357 points ingve | 8 comments | | HN request time: 0.702s | source | bottom
1. 1vuio0pswjnm7 ◴[] No.43976096[source]
Below is a PDF. It is a .txt file. I can save it with a .pdf extension and open it in a PDF viewer. I can make changes in a text editor. For example, by editing this text file, I can change the text displayed on the screen when the PDF is opened, the font, font size, line spacing, the maximum characters per line, number of lines per page, the paper width and height, as well as portrait versus landscape mode.

   %PDF-1.4
   1 0 obj
   <<
   /CreationDate (D:2025)
   /Producer 
   >>
   endobj
   2 0 obj
   <<
   /Type /Catalog
   /Pages 3 0 R
   >>
   endobj
   4 0 obj
   <<
   /Type /Font
   /Subtype /Type1
   /Name /F1
   /BaseFont /Times-Roman
   >>
   endobj
   5 0 obj
   <<
     /Font << /F1 4 0 R >>
     /ProcSet [ /PDF /Text ]
   >>
   endobj
   6 0 obj
   <<
   /Type /Page
   /Parent 3 0 R
   /Resources 5 0 R
   /Contents 7 0 R
   >>
   endobj
   7 0 obj
   <<
   /Length 8 0 R
   >>
   stream
   BT
   /F1 50 Tf
   1 0 0 1 50 752 Tm
   54 TL
   (PDF is)' 
   ((a) a text format)'
   ((b) a graphics format)'
   ((c) (a) and (b).)'
   ()'
   ET
   endstream
   endobj
   8 0 obj
   53
   endobj
   3 0 obj
   <<
   /Type /Pages
   /Count 1
   /MediaBox [ 0 0 612 792 ]
   /Kids [ 6 0 R ]
   >>
   endobj
   xref
   0 9
   0000000000 65535 f 
0000000009 00000 n 0000000113 00000 n 0000000514 00000 n 0000000162 00000 n 0000000240 00000 n 0000000311 00000 n 0000000391 00000 n 0000000496 00000 n trailer << /Size 9 /Root 2 0 R /Info 1 0 R >> startxref 599 %%EOF
replies(3): >>43976133 #>>43976276 #>>43980858 #
2. swsieber ◴[] No.43976133[source]
It can also have embedded binary streams. It was not made for text. It was made for layout and graphics. You give nice examples, but each of those lines could have been broken up into one call per character, or per word, even out of order.
replies(1): >>43979331 #
3. 1vuio0pswjnm7 ◴[] No.43976276[source]
"PDF" is an acronym for for "Portable Document Format"

"2.3.2 Portability

A PDF file is a 7-bit ASCII file, which means PDF files use only the printable subset of the ASCII character set to describe documents even those with images and special characters. As a result, PDF files are extremely portable across diverse hardware and operating system environments."

https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandard...

replies(1): >>43980043 #
4. hnick ◴[] No.43979331[source]
It can also use fonts which map glyphs via characters which do not represent the final visual item e.g. "PDF" could be "1#F" and you only really know what it looks like by rendering then viewing/OCR.

A nice file won't, but sometimes the best work is in not dealing with nice things.

replies(1): >>43980508 #
5. normie3000 ◴[] No.43980043[source]
> PDF files use only the printable subset of the ASCII character set to describe documents even those with images and special characters

Great, so PDF source code is easily printable?

replies(1): >>43981525 #
6. 90s_dev ◴[] No.43980508{3}[source]
See this is why we can't have nice things.
7. jimjimjim ◴[] No.43980858[source]
This is the "Hello World" of PDFs.

Most pdfs these days have all of the objs compressed with deflate.

and then, because that didn't make it difficult enough to follow, a lot of pdfs have most of the objects grouped up inside object stream type objects which then get compressed. So you can't have text editor search for a "6 0 Obj" when you are tracking down the end of a "6 0 R"

8. gpvos ◴[] No.43981525{3}[source]
Except most are compressed or contain binary streams. You can transform any PDF into an equivalent ASCII PDF though, e.g. using qpdf.