Most active commenters
  • Nanzikambe(3)
  • tptacek(3)

←back to thread

801 points tnorthcutt | 14 comments | | HN request time: 1.274s | source | bottom
Show context
Nanzikambe ◴[] No.7524756[source]
Interesting article. I'd actually not heard of Tarsnap before, one question (to those who use it), why would a geek use it over:

  tar -cf - / --exclude='/proc/*' --exclude='/dev/*' [..] | \
      xz -z | \
      openssl enc -aes-256-cbc -e -salt | \
      > /mnt/your/networked/google/drive/backup.$(hostname -a).$(date "+%Y%m%d-%H%M%S").aes.tar.xz
I spent a while going through https://www.tarsnap.com/ and I didn't find any flexibility tarsnap offers over it. To make it work unattended, it's trivial to generate a unique key per backup for openssl (use a tmpfs) and then gpg encrypt the key and email it to sys admins or whatever mailing list before killing the tmpfs.

I could understand the appeal to less tech savvy users if there were a gui, or it featured cross platform support beyond those supported by tar, <insert compression tool>, openssl/aespipe/gpg/<insert encryption tool>, or the storage was super cheap.

So what's the value proposition here?

replies(5): >>7524774 #>>7524790 #>>7524804 #>>7524909 #>>7525099 #
1. tomp ◴[] No.7524774[source]
Data deduplication, incremental backups.
replies(3): >>7524873 #>>7525132 #>>7525307 #
2. Nanzikambe ◴[] No.7524873[source]
Heh apologies, my fault for trying to be clever, the mechanism I actually use is incremental and deduplicated. I substituted it for tar to simplify.

I actually use ZFS (filesystem), so my backup flow is closer to:

  TSTAMP="backup-$(date "+%Y%m%d-%H%M%S")"
  zfs snapshot -r $TSTAMP
  zfs send $TSTAMP | \
      xz -z | \
      openssl enc -aes-256-cbc -e -salt | \
      > /mnt/your/networked/google/drive/backup.$(hostname -a).$TSTAMP.aes.tar.xz
The underyling ZFS filesystem is deduplicated at filesystem level, and snapshots are incremental. THere're a few other minor differences (the dest is another ZFS host which syncs to Google drive, and I nuke the local snapshot after send because RAID 1+0 space is more expensive than RAID1 .. )
replies(2): >>7524963 #>>7525927 #
3. Nanzikambe ◴[] No.7524963[source]
To answer my own question: deduplication :)

I had not considered multiple backup sources, mine is deduplicated per host, am I to understand tarsnap is deduplicated across all hosts sharing a set of keys?

replies(2): >>7526327 #>>7529217 #
4. ◴[] No.7525132[source]
5. tptacek ◴[] No.7525307[source]
HEAD-DESK.

Deduplication and incremental backups are table-stakes for backup software.

The reason a business would use Tarsnap rather than some other backup service is the level of confidence that Colin can provide that Tarsnap will reliably protect their data from attackers, including compelled insiders at Tarsnap.

In other words, Tarsnap can offer an enterprise an offsite backup service that is demonstrably as safe as backup data that the enterprise retains direct custody of.

That is not an offering other backup providers can reliably duplicate.

replies(1): >>7525642 #
6. tomp ◴[] No.7525642[source]
That's right, I was just answering the parent what advantages Tarsnap it has compared to a OSS, bash-pipe-made, tar+encrypt solution.
replies(1): >>7525740 #
7. tptacek ◴[] No.7525740{3}[source]
Security remains the most important difference between those two options.
replies(1): >>7526231 #
8. foobarqux ◴[] No.7525927[source]
I would like to do something similar with BTRFS.

How are your snapshots incremental? In BTRFS you would need to specify a base snapshot.

What is the restore process? You init a zfs file systems and then zfs receive the backups in chronological order? How are the dependencies between snapshots managed?

replies(1): >>7529168 #
9. robryk ◴[] No.7526231{4}[source]
I assume you refer to all the seemingly nitty problems with the pipeline above (from what I can see, there is no way to verify that the archive wasn't tampered with).

Would you say the same about a solution that signs and encrypts the archive with gpg (signs with a machine's key and encrypts it to the owner's key). If so, can you elaborate on some examples of security problems that solution could have?

replies(1): >>7526274 #
10. tptacek ◴[] No.7526274{5}[source]
Are you asking if I could design you a secure backup system?

I could, and it might asymptotically approach the quality of Colin's.

I don't think you're comfortable with the amount of money I'd charge for that service.

You're better off paying Colin cost-plus for AWS storage, since that's all he seems to want to charge. :)

11. beagle3 ◴[] No.7526327{3}[source]
I think that's the case.

Note: https://github.com/bup/bup does that too (though it does not encrypt), and http://liw.fi/obnam/ does too (and it does encrypt).

What tarsnap gives you that obnam doesn't is (a) managed cloud storage, (b) tarsnap's history and reputation, and (c) Colin's personal reputation. That's a lot, and it costs money above the S3 storage costs (which you could point obnam at).

12. vbit ◴[] No.7529168{3}[source]
The 'zfs send' command will send an incremental snapshot if you specify '-i snap1 snap2'.

To restore, of course, you'll have to have snap1, and then you can apply the increment.

replies(1): >>7532936 #
13. vbit ◴[] No.7529217{3}[source]
Also, easier restore and snapshot deletion.

Consider how you would restore using incremental ZFS snapshots. You'd have to pull all the snaphots, unpack the base snapshot and then sequentially unpack each incremental snapshot.

In tarsnap, the server will compute the 'snapshot' you want for you, and will only send you the data blocks that belong to that snapshot.

In tarsnap, you can also delete any snapshot you want, and only blocks belonging exclusively to that snapshot will be deleted. In your system, deleting a snapshot means you lose all snapshots from that one until the next full snapshot.

Also, in ZFS you're limited to backing up complete datasets, but with tarsnap you can backup any set of files you want.

14. foobarqux ◴[] No.7532936{4}[source]
That's what I thought. I wanted to know if the OP had some way to manage the dependencies between snapshots.