RAID & NAS Data Recovery Guide — the laboratory method

A RAID spreads data across several disks to gain performance, capacity or fault tolerance. A NAS is the consumer and SMB variant: an enclosure that adds, on top of a RAID, an operating system, network shares and — crucial for recovery — snapshots. In both cases, the challenge is never to "repair disks": it's to reconstruct the logic of the whole.

This chapter is for admins and IT managers facing a degraded array. To send a server in, see the RAID service or NAS service; for a real, numbered case, the Dell RAID 5 study.

1 · Understand the levels and their tolerance

Recovery depends first on the RAID level and its safety margin:

RAID 0 — striped data, no redundancy. One disk lost = the whole array compromised. The most guarded prognosis.
RAID 1 — mirror. Each disk holds a full copy; very favorable.
RAID 5 — striping + distributed parity. Tolerates one disk; beyond that, the array falls.
RAID 6 — dual parity. Tolerates two disks.
RAID 10, 50, 60 — combinations (mirror + striping, or striping of RAID 5/6).

On the NAS side, Synology layers its SHR (and SHR-2) over mdadm + LVM to mix disks of different sizes; Netgear does the same with X-RAID. The logic stays that of a classic RAID, with a proprietary layer to decode.

2 · The fatal mistake: the premature rebuild

This is the number-one cause of permanent RAID loss. Faced with a failing disk, the reflex is to launch an automatic rebuild. But if a second disk is unstable, or the controller writes the wrong parity, the rebuild massively overwrites the still-healthy data of the other members. An array recoverable at 98% can become unrecoverable in minutes.

The reflex that saves data. Shut the system down immediately. Launch no rebuild, don't swap the disks around, don't reset the controller. Note the bay order. A disk flagged "Foreign" is almost never a dead disk: only its configuration metadata is at fault.

3 · The laboratory process

Step 1 — Clone every member

Before any analysis, each disk is cloned sector by sector with a write blocker. Physically failing members (heads, PCB, firmware) go through the cleanroom first. From there, everything happens on the copies: the original array stays intact.

Step 2 — De-striping & parameter identification

The heart of the craft. Without trusting the controller, we determine by data entropy analysis: the stripe size, the exact disk order, the parity rotation direction and the initial offset. RAID 5/6 parity follows an XOR operation: knowing n-1 blocks of a stripe, we reconstruct the n-th. This is what lets us virtually recreate the content of a missing disk.

Step 3 — Virtual volume reconstruction

With the parameters validated, we assemble a virtual volume from the images, writing nowhere. For a NAS, we first remount the SHR/X-RAID stack (mdadm + LVM), then the file system.

Step 4 — File system & extraction

We repair the structures (VMFS, ZFS, ReFS, EXT4, XFS, NTFS, Btrfs) and extract the data: files, SQL databases, and VMDK (VMware), VHDX (Hyper-V), QCOW2 (Proxmox) virtual machines.

Step 5 — NAS case: snapshots

A decisive specificity after ransomware. Many attacks encrypt visible files but ignore the read-only snapshots of Btrfs/ZFS. We mount the snapshots predating the attack and restore the healthy state — often without paying a ransom.

4 · Success rates by scenario

RAID 5 — 1 disk down — 98%
RAID 6 — 2 disks down — 95%
Lost configuration / dead controller — 95%
NAS — 1-disk failure (SHR) — 95%
NAS ransomware with snapshots — 92%
RAID 0 — 1 disk failed — 72%
RAID 5 — 2+ disks down — 68%
NAS ransomware without snapshot — 35%

5 · The mistakes that destroy data

What you must never do to a failing RAID or NAS

Launch an automatic rebuild — massive overwrite of the other members.
Swap the disk order around — order is part of the array definition.
Reset the controller or recreate the volume — erases the configuration (stripe, parity, offset).
Delete snapshots after ransomware — that's the healthy state you're destroying.
Leave an attacked NAS on the network — encryption can continue.

Guiding principle. On a redundant system, data almost always survives the hardware failure. What kills it is the rushed intervention. Power off, document, hand over.

Recovering a RAID or NAS: the method

1 · Understand the levels and their tolerance

2 · The fatal mistake: the premature rebuild

3 · The laboratory process

Step 1 — Clone every member

Step 2 — De-striping & parameter identification

Step 3 — Virtual volume reconstruction

Step 4 — File system & extraction

Step 5 — NAS case: snapshots

4 · Success rates by scenario

5 · The mistakes that destroy data

What you must never do to a failing RAID or NAS

From method to real case.

RAID & server data recovery

22 TB recovered in 48h

Degraded array or encrypted NAS? Relaunch nothing.