A RAID spreads data across several disks to gain performance, capacity or fault tolerance. A NAS is the consumer and SMB variant: an enclosure that adds, on top of a RAID, an operating system, network shares and — crucial for recovery — snapshots. In both cases, the challenge is never to "repair disks": it's to reconstruct the logic of the whole.
This chapter is for admins and IT managers facing a degraded array. To send a server in, see the RAID service or NAS service; for a real, numbered case, the Dell RAID 5 study.
1 · Understand the levels and their tolerance
Recovery depends first on the RAID level and its safety margin:
- RAID 0 — striped data, no redundancy. One disk lost = the whole array compromised. The most guarded prognosis.
- RAID 1 — mirror. Each disk holds a full copy; very favorable.
- RAID 5 — striping + distributed parity. Tolerates one disk; beyond that, the array falls.
- RAID 6 — dual parity. Tolerates two disks.
- RAID 10, 50, 60 — combinations (mirror + striping, or striping of RAID 5/6).
On the NAS side, Synology layers its SHR (and SHR-2) over mdadm + LVM to mix disks of different sizes; Netgear does the same with X-RAID. The logic stays that of a classic RAID, with a proprietary layer to decode.
2 · The fatal mistake: the premature rebuild
This is the number-one cause of permanent RAID loss. Faced with a failing disk, the reflex is to launch an automatic rebuild. But if a second disk is unstable, or the controller writes the wrong parity, the rebuild massively overwrites the still-healthy data of the other members. An array recoverable at 98% can become unrecoverable in minutes.
3 · The laboratory process
Step 1 — Clone every member
Before any analysis, each disk is cloned sector by sector with a write blocker. Physically failing members (heads, PCB, firmware) go through the cleanroom first. From there, everything happens on the copies: the original array stays intact.
Step 2 — De-striping & parameter identification
The heart of the craft. Without trusting the controller, we determine by data entropy analysis: the stripe size, the exact disk order, the parity rotation direction and the initial offset. RAID 5/6 parity follows an XOR operation: knowing n-1 blocks of a stripe, we reconstruct the n-th. This is what lets us virtually recreate the content of a missing disk.
Step 3 — Virtual volume reconstruction
With the parameters validated, we assemble a virtual volume from the images, writing nowhere. For a NAS, we first remount the SHR/X-RAID stack (mdadm + LVM), then the file system.
Step 4 — File system & extraction
We repair the structures (VMFS, ZFS, ReFS, EXT4, XFS, NTFS, Btrfs) and extract the data: files, SQL databases, and VMDK (VMware), VHDX (Hyper-V), QCOW2 (Proxmox) virtual machines.
Step 5 — NAS case: snapshots
A decisive specificity after ransomware. Many attacks encrypt visible files but ignore the read-only snapshots of Btrfs/ZFS. We mount the snapshots predating the attack and restore the healthy state — often without paying a ransom.
4 · Success rates by scenario
- RAID 5 — 1 disk down — 98%
- RAID 6 — 2 disks down — 95%
- Lost configuration / dead controller — 95%
- NAS — 1-disk failure (SHR) — 95%
- NAS ransomware with snapshots — 92%
- RAID 0 — 1 disk failed — 72%
- RAID 5 — 2+ disks down — 68%
- NAS ransomware without snapshot — 35%
5 · The mistakes that destroy data
What you must never do to a failing RAID or NAS
- Launch an automatic rebuild — massive overwrite of the other members.
- Swap the disk order around — order is part of the array definition.
- Reset the controller or recreate the volume — erases the configuration (stripe, parity, offset).
- Delete snapshots after ransomware — that's the healthy state you're destroying.
- Leave an attacked NAS on the network — encryption can continue.
Guiding principle. On a redundant system, data almost always survives the hardware failure. What kills it is the rushed intervention. Power off, document, hand over.
