←back to thread

663 points nikisweeting | 3 comments | | HN request time: 0.406s | source

We've been pushing really hard over the last 6mo to develop this release. I'd love to hear feedback from people who've worked on big plugin systems in the past, or anyone who's tried our betas!
Show context
toomuchtodo ◴[] No.41861236[source]
https://github.com/ArchiveTeam/grab-site might be helpful. I'm a fan of the ability to create WARC archives from a target, uploard the WARC files to object storage (whether that is IA, S3, Backblaze B2, etc), and then keep them in cold storage or serve them up via HTTPS or a torrent (mutable, preferred). The Internet Archive serves a torrent file for every item they host; one can do the same with WARC archives to enable a distributed archive. CDX indexes can be used for rapidly querying the underlying WARC archives.

You might support cryptographically signing WARC archives; Wayback is particular about archive provenance and integrity, for example.

https://www.loc.gov/preservation/digital/formats/fdd/fdd0005... ("CDX Internet Archive Index File")

https://www.loc.gov/preservation/digital/formats/fdd/fdd0002... ("WARC, Web ARChive file format")

https://github.com/internetarchive/wayback/tree/master/wayba... ("Wayback CDX Server API - BETA")

replies(3): >>41861288 #>>41861743 #>>41861951 #
1. 0cf8612b2e1e ◴[] No.41861951[source]

  The Internet Archive serves a torrent file for every item they host
I had no idea. I have found the IA serving speed to be pretty terrible. Are the torrents any better? Presumably the only ones seeding the files are IA themselves.
replies(2): >>41862348 #>>41864588 #
2. toomuchtodo ◴[] No.41862348[source]
The benefit is not in seeding speed directly from IA, but the potential for distributed access and seeding of the item. Think of it as a filename of a zip file in a flat distributed filesystem, with the ability to cherrypick files that make up the item out via traditional bittorrent mechanisms. Anyone can consume each item via torrent, continue to seed, and then also access the underlying data. IA acts as the storage system of last resort (and the metadata index).
3. pabs3 ◴[] No.41864588[source]
The torrents have better speeds because they have WebSeeds for multiple IA servers, so you can download from multiple servers at once.