Hard drive failure in my zpool 😞
Published:
I have a storage box in my house that stores important documents, backups, VM disk images, photos, a copy of the Tor Metrics archive and other odd things. I’ve put a lot of effort into making sure that it is both reliable and performant. When I was working on a modern CollecTor for Tor Metrics recently, I used this to be able to run the entire history of the Tor network through the prototype replacement to see if I could catch any bugs.
I have had my share of data loss events in my life, but since I’ve found ZFS I have hope that it is possible to avoid, or at least seriously minimise the risk of, any catastrophic data loss events ever happening to me again. ZFS has:
- cryptographic checksums to validate data integrity
- mirroring of disks
- “scrub” function that ensures that the data on disk is actually still good even if you’ve not looked at it yourself in a while
ZFS on its own is not the entire solution though. I also mix-and-match hard drive models to ensure that a systematic fault in a particular model won’t wipe out all my mirrors at once, and I also have scheduled SMART self-tests to detect faults before any data loss has occured.
This means I now have to treat that drive as “going to fail soon” which means that I don’t have redundancy in my zpool anymore, so I have to act. Fortunately, in September 2017 when my workstation died, I received some donations towards the hardware I use for my open source work and I did buy a spare HDD for this very situation!
At present my zpool setup looks like:
% zpool status flat
pool: flat
state: ONLINE
scan: scrub repaired 0 in 0 days 07:05:28 with 0 errors on Fri Apr 5 07:05:36 2019
config:
NAME STATE READ WRITE CKSUM
flat ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gptid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ONLINE 0 0 0
gptid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
gptid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ONLINE 0 0 0
gptid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ONLINE 0 0 0
cache
gptid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ONLINE 0 0 0
errors: No known data errors
The drives in the two mirrors are 3TB drives, in each mirror is one WD Red and one Toshiba NAS drive. In this case, it is one of the WD Red drives that has failed and I’ll be replacing it with another WD Red. One important thing to note is that you have to replace the drive with one of equal or greater capacity. In this case it is the same model so the capacity should be the same, but not all X TB drives are going to be the same size.
You’ll notice here that it is saying No known data errors
. This is because
there hasn’t been any issues with the data yet, it is just a SMART failure, and
hopefully by replacing the disk any data error can be avoided entirely.
My plan was to move to a new system soon, with 8 bays. In that system I’ll keep the stripe over 2 mirrors but one mirror will run over 3x 6TB drives with the other remaining on 2x 3TB drives. This incident leaves me with only 1 leftover 3TB drive though so maybe I’ll have to rethink this.
My current machine, an HP MicroServer, does not support hot-swapping the drives so I have to start by powering off the machine and replacing the drive.
% zpool status flat
pool: flat
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-2Q
scan: scrub repaired 0 in 0 days 07:05:28 with 0 errors on Fri Apr 5 07:05:36 2019
config:
NAME STATE READ WRITE CKSUM
flat DEGRADED 0 0 0
mirror-0 ONLINE 0 0 0
gptid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ONLINE 0 0 0
gptid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ONLINE 0 0 0
mirror-1 DEGRADED 0 0 0
gptid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ONLINE 0 0 0
xxxxxxxxxxxxxxxxxxxx UNAVAIL 0 0 0 was /dev/gptid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
cache
gptid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ONLINE 0 0 0
errors: No known data errors
The disk that was part of the mirror is now unavailable, but the pool is still functioning as the other disk is still present. This means that there are still no data errors and everything is still running. The only downtime was due to the non-hot-swappableness of my SATA controller.
Through the web interface in FreeNAS, it is possible to now use the new disk to replace the old disk in the mirror: Storage -> View Volumes -> Volume Status (under the table, with the zpool highlighted) -> Replace (with the unavailable disk highlighted).
Running zpool status
again:
% zpool status flat
pool: flat
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Apr 5 16:55:47 2019
1.30T scanned at 576M/s, 967G issued at 1.12G/s, 4.33T total
4.73G resilvered, 21.82% done, 0 days 00:51:29 to go
config:
NAME STATE READ WRITE CKSUM
flat ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gptid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ONLINE 0 0 0
gptid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
gptid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ONLINE 0 0 0
gptid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ONLINE 0 0 0 (resilvering)
cache
gptid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ONLINE 0 0 0
errors: No known data errors
And everything should be OK again soon, now with the dangerous disk removed and a hopefully more reliable disk installed.
This has put a dent in my plans to upgrade my storage, so for now I’ve added the hard drives I’m looking for to my Amazon wishlist.
As for the drive that failed, I’ll be doing an ATA Secure Erase and then disposing of it. NIST SP 800-88 thinks that ATA Secure Erase is in the same category as degaussing a hard drive and that it is more effective than overwriting the disk with software. ATA Secure Erase is faster too because it’s the hard drive controller doing the work. I just have to hope that my firmware wasn’t replaced with firmware that only fakes the process (or I’ll just do an overwrite anyway to be sure). According to the same NIST document, “for ATA disk drives manufactured after 2001 (over 15 GB) clearing by overwriting the media once is adequate to protect the media from both keyboard and laboratory attack”.
This blog post is also a little experiment. I’ve used a Unicode emoji in the title, and I want to see how various feed aggregators and bots handle that. Sorry if I broke your aggregator or bot.