←back to thread

401 points chromy | 1 comments | | HN request time: 0.206s | source

A while ago I was looking for information on a obscure and short lived British computer.

I found an article[1] in the archives of BYTE magazine[2] - and was captivated immediately by the tech adverts of bygone eras.

This led to a long side project to be able to see all 100k pages of BYTE in a single searchable place.

[1]: https://byte.tsundoku.io/#198502-381

[2]: https://news.ycombinator.com/item?id=17683184

Show context
Eriks ◴[] No.45029511[source]
dezoomify-rs https://byte.tsundoku.io/byte_files/10/0_0.jpg

Found the following zoom levels:

0. byte (Deep Zoom Image) (868480 x 453747 pixels, 376956 tiles)

...

I think, I'll skip downloading this

replies(1): >>45029789 #
chromy ◴[] No.45029789[source]
Yes it's a fair amount of data:

pdfs/ 12.5 GiB

pages/ 91.96 GiB (Each page as a .png)

text/ 365.03 MiB (Each page as text)

byte_files/ 55.98 GiB (The 1024x1024 tiles as .jpeg)

I had not heard of https://github.com/lovasoa/dezoomify-rs before, that's really cool!

replies(1): >>45030371 #
1. Sunspark ◴[] No.45030371[source]
I wonder how it would do with the djvu codec which tends to have been used specifically for archiving documents. I suppose it is best applied at source if the physical material is at hand.

Might still be worth taking a look at as an experiment since this codec separates text, background and images into different layers, even when converted from another format.