Most active commenters

    ←back to thread

    128 points Brajeshwar | 17 comments | | HN request time: 0.226s | source | bottom
    1. underseacables ◴[] No.42479808[source]
    I suppose it comes down to what the purpose of such archiving is.

    I think it's the preservation of information, but I also believe 90% is absolutely pointless. There is just so much of it, and data storage so cheap, that it makes sense to just save everything.

    replies(3): >>42479956 #>>42479985 #>>42480107 #
    2. sigio ◴[] No.42479956[source]
    Well... storage is cheap, but not cheap enough to save everything, with just usenet being in the 400TB/day range these days. Sure, it's cheap enough to save every webpage you visit during your life, but probably not cheap enough to save every video you click on youtube or watch on a streaming-service, and all the music you listen to all day.

    Though just the music compressed in opus at 128kbit might work ok, 60 years of 24/7 128kbit is 30TB, so that would fit on 1 large HDD currently.

    replies(2): >>42480571 #>>42481375 #
    3. dreamcompiler ◴[] No.42479985[source]
    That data storage is also ephemeral. Nobe of it will last as long as a paper note, unless some human goes to the trouble of copying it all onto new drives with new software every ten years or so.
    replies(1): >>42480154 #
    4. danielbln ◴[] No.42480107[source]
    Data rots though, you can't just save it once and be done with it. You have to migrate it across storage mediums, formats etc. It's a recurrent effort/cost.
    replies(1): >>42480160 #
    5. Atreiden ◴[] No.42480154[source]
    With a proper NAS and RAID10 for double parity, it's a bit like Theseus ship. Just keep swapping out drives when they become unhealthy and you never have to rebuild or migrate
    replies(2): >>42480339 #>>42480412 #
    6. bdhcuidbebe ◴[] No.42480160[source]
    More planning for less effort.

    Do your research first. Use standards

    Eg: html, pdf, h264/h265/av1 in mp4 container, chd, zip and so on depending on what you are storing.

    replies(1): >>42480888 #
    7. ninalanyon ◴[] No.42480339{3}[source]
    Eventually the controller will die and eventually compatible ones will no longer be produced or will at least be inconvenient to obtain or commission and hence expensive.

    Paper lasts for centuries without any attention beyond keeping it moderately dry and away from things that eat it.

    replies(2): >>42480390 #>>42480417 #
    8. emptiestplace ◴[] No.42480390{4}[source]
    No sane person uses hardware RAID in 2024, if that's what you're referring to.
    replies(1): >>42480415 #
    9. ◴[] No.42480412{3}[source]
    10. zamadatix ◴[] No.42480415{5}[source]
    Whether you're using hardware RAID or not you still need a hardware storage controller of some type which accepts the new disks you can buy and works with the NAS. What they are saying is eventually that'll be more $ and time than just migrating off the system would be. From ENIAC to now could fit in one lifespan, would you still be maintaining a home floppy drive backup system in the 2040s or just save the time and effort with a migration?
    replies(1): >>42482497 #
    11. ◴[] No.42480417{4}[source]
    12. saulpw ◴[] No.42480571[source]
    Music is actually an ideal candidate. I don't listen to music all day, and when I do listen to it, it's often something I've listened to before. My current collection is about 200GB and that includes a ton of stuff I've never listened to; it seems reasonable that a full life's worth of music could fit in 1TB, easily.
    13. HeatrayEnjoyer ◴[] No.42480888{3}[source]
    On what physical medium?

    I have 1 terabyte of data in 1860, how do I make sure the storage medium is still intact in 2024?

    replies(2): >>42482731 #>>42484591 #
    14. add-sub-mul-div ◴[] No.42481375[source]
    If that much data comes across Usenet daily then how do services afford the storage to offer years of retention?

    You can't dedupe the large binary files because they're encoded in small parts likely differently every time they're posted.

    15. jpalawaga ◴[] No.42482497{6}[source]
    sure, you can always move the old storage mechanism to something new if it is too cumbersome.

    why still back up floppies if you could just move the data to a single dvd, or throw is on the SAN?

    RAID is just algorithms, the actual transport doesn't matter (i.e. spinning platter and solid state both use SATA connectors).

    16. TacticalCoder ◴[] No.42482731{4}[source]
    > I have 1 terabyte of data in 1860, how do I make sure the storage medium is still intact in 2024?

    Storage keeps growing and price of storage keeps doing down.

    My DOS and even some C64 source code made it to this day on backups (DVDs, HDDs, SSDs, USB memory sticks, etc., both online and offline) and to ZFS pools. Medium that didn't exist in the 80s/early 90s.

    Floppy disks -> 40 MB HDD -> 6.4 GB HDD -> 80 GB HDD -> 500 GB HDD -> 240 GB SSD -> 1 TB NVMe SSD.

    You get the idea.

    The way you get sure you still have your data is by not focusing on the medium but by focusing on the fact that data is data.

    Medium comes and goes. Data can (and should) be copied to new medium.

    Not unlike:

        /home/pub/backups/oldBackups/DOSbackups/...
        ...Conner80MBHDDbackups/backups/oldBackups/Commodore64backups/...
    
    Some people are going to complain about the naming but I have all my emails except for six months back since I started using the Internet. And I still have all nearly a lot of my data since I started using computers. 8-bit computers.

    Do you?

    I don't care about naming much. "search, don't sort".

    We've got emulators for just about every and any system. My vintage arcade cab has both real PCBs and a Pi running an emulator with thousands of arcade games on it.

    You can already, today, emulate, say, the Raspberry Pi model you want using QEMU. There are container file that'll gladly do that for you.

    Unless civilization ends there's simply a not a world in which, say, PNG, JPG and x265 files aren't readable. This just won't happen.

    FWIW I'm paranoid integrity of my data: I've got my own naming scheme where a cryptographic hash is added to many of my files.

    For example:

            DSC_91394-b3-ae4f2877d3.jpg
    
    This means "This file's Blake3 checksum begins with ae4f2877d3".

    I then have a script doing statistical sampling: I enter a percentage and that percentage of files where a cryptographic hash is part of the filename are checked, randomly (if I enter 100 then 100% of the files are tested).

    If I enter for example '7', then 7% of the files are tested and then there's high probability all checksums are correct.

    > On what physical medium?

    That is the wrong question.

    17. ssl-3 ◴[] No.42484591{4}[source]
    If it is 1860 and you want to see that your data is preserved for 164 years, then you start by keeping it in geographically diverse places where people will look after it and tend its needs for 164 years.

    As a concept, that's really not different at all from holding on to today's data here in 2024.