https://github.com/ArchiveTeam/grab-site might be helpful. I'm a fan of the ability to create WARC archives from a target, uploard the WARC files to object storage (whether that is IA, S3, Backblaze B2, etc), and then keep them in cold storage or serve them up via HTTPS or a torrent (mutable, preferred). The Internet Archive serves a torrent file for every item they host; one can do the same with WARC archives to enable a distributed archive. CDX indexes can be used for rapidly querying the underlying WARC archives.
You might support cryptographically signing WARC archives; Wayback is particular about archive provenance and integrity, for example.
https://www.loc.gov/preservation/digital/formats/fdd/fdd0005... ("CDX Internet Archive Index File")
https://www.loc.gov/preservation/digital/formats/fdd/fdd0002... ("WARC, Web ARChive file format")
https://github.com/internetarchive/wayback/tree/master/wayba... ("Wayback CDX Server API - BETA")
replies(3):