This week, it finally happened. I think it’s the first time in 20 years that a hard drive has died on me without warning. And it was also the first time I was using an NVMe drive, but that could be a coincidence.
The drive was still under warranty (barely a year and a half old). I even had a spare lying around. But the true cost of restoration is, of course, my own labor. My planning had not been perfect (for such a remote event, as I had judged). However, it was easy enough. I simply installed NixOS from a USB loader and downloaded my configuration from my backup on my NAS (daily rsync jobs to the rescue). I also downloaded all the important files for my home directory. Then, it was simply a matter of adjusting a few things in the configuration file, rebuilding the system, and voilà. Well, except for a few things that didn’t work quite right for some reason and had to be manually fixed, but nothing major.
However, next time I want this to be even easier. It’s probably overkill to install a RAID controller and have multiple drives running in RAID1 or RAID5, but the restoration process is still too much manual work. I was thinking of regularly backing up my main drive on the block device level, so I would just have to swap out the drive and restore the delta from the backup. I’m not quite sure if that’s feasible or a good idea. For my personal system, I have to balance the investment of preparing for a disaster with the likelihood and impact of such an event. This seems like a good trade-off, but I would be curious to hear how other people prepare for drive failure.
I have successfully restored a whole disk to a working OS from a borgbackup, which takes much less space on the backup storage due borgs extremely efficient compression and deduplication.
So thats what I would recommend.
I backup all my computers and servers with borgmatic.
If you need any help with setting it up, let me know.
I have btrfs snapshots with snapper on my desktop. It keeps the last 20 snapshots. Sending them to a second drive would require an equal amount of space as the main drive, which is ~850GB / 1T full.
But the borg backup for the same takes only ~450GB and also keeps the last 20 versions.
So I use btrfs to restore situations about filechanges (for example a bad system update).
Borg is easier to set up a central server for all my devices, because it takes much less space. So I use that in case where the drive fails. To restore I set up the same partition layout as before and then throw the borg backup at it. It seems pretty easy so far.