←back to thread

128 points Brajeshwar | 4 comments | | HN request time: 0.631s | source
Show context
underseacables ◴[] No.42479808[source]
I suppose it comes down to what the purpose of such archiving is.

I think it's the preservation of information, but I also believe 90% is absolutely pointless. There is just so much of it, and data storage so cheap, that it makes sense to just save everything.

replies(3): >>42479956 #>>42479985 #>>42480107 #
danielbln ◴[] No.42480107[source]
Data rots though, you can't just save it once and be done with it. You have to migrate it across storage mediums, formats etc. It's a recurrent effort/cost.
replies(1): >>42480160 #
1. bdhcuidbebe ◴[] No.42480160[source]
More planning for less effort.

Do your research first. Use standards

Eg: html, pdf, h264/h265/av1 in mp4 container, chd, zip and so on depending on what you are storing.

replies(1): >>42480888 #
2. HeatrayEnjoyer ◴[] No.42480888[source]
On what physical medium?

I have 1 terabyte of data in 1860, how do I make sure the storage medium is still intact in 2024?

replies(2): >>42482731 #>>42484591 #
3. TacticalCoder ◴[] No.42482731[source]
> I have 1 terabyte of data in 1860, how do I make sure the storage medium is still intact in 2024?

Storage keeps growing and price of storage keeps doing down.

My DOS and even some C64 source code made it to this day on backups (DVDs, HDDs, SSDs, USB memory sticks, etc., both online and offline) and to ZFS pools. Medium that didn't exist in the 80s/early 90s.

Floppy disks -> 40 MB HDD -> 6.4 GB HDD -> 80 GB HDD -> 500 GB HDD -> 240 GB SSD -> 1 TB NVMe SSD.

You get the idea.

The way you get sure you still have your data is by not focusing on the medium but by focusing on the fact that data is data.

Medium comes and goes. Data can (and should) be copied to new medium.

Not unlike:

    /home/pub/backups/oldBackups/DOSbackups/...
    ...Conner80MBHDDbackups/backups/oldBackups/Commodore64backups/...
Some people are going to complain about the naming but I have all my emails except for six months back since I started using the Internet. And I still have all nearly a lot of my data since I started using computers. 8-bit computers.

Do you?

I don't care about naming much. "search, don't sort".

We've got emulators for just about every and any system. My vintage arcade cab has both real PCBs and a Pi running an emulator with thousands of arcade games on it.

You can already, today, emulate, say, the Raspberry Pi model you want using QEMU. There are container file that'll gladly do that for you.

Unless civilization ends there's simply a not a world in which, say, PNG, JPG and x265 files aren't readable. This just won't happen.

FWIW I'm paranoid integrity of my data: I've got my own naming scheme where a cryptographic hash is added to many of my files.

For example:

        DSC_91394-b3-ae4f2877d3.jpg
This means "This file's Blake3 checksum begins with ae4f2877d3".

I then have a script doing statistical sampling: I enter a percentage and that percentage of files where a cryptographic hash is part of the filename are checked, randomly (if I enter 100 then 100% of the files are tested).

If I enter for example '7', then 7% of the files are tested and then there's high probability all checksums are correct.

> On what physical medium?

That is the wrong question.

4. ssl-3 ◴[] No.42484591[source]
If it is 1860 and you want to see that your data is preserved for 164 years, then you start by keeping it in geographically diverse places where people will look after it and tend its needs for 164 years.

As a concept, that's really not different at all from holding on to today's data here in 2024.