Sudden BSOD with Broken Intel RSTe RAID5 Array - Please Help

jkool702 · June 23, 2018, 2:53am

WHAT HAPPENED: The system has an 8-disk RAID 5 array, that is implemented using Intel RSTe (v5.1.xxxx in the OpROM UEFI, v5.4.xxxx for the windows drivers and UI) (Both are un-modded). I was copying some files to the raid array when windows decided it was overdue to throw a BSOD. When I got logged back into windows, it had disassembled the RAID array. By this I mean the array is not recognized as a single RAID 5 valume - rather, it is now recognized as a set of 8 individual (non-raid) disks. This is true both for RSTe (both in UEFI/BIOS and from the user interface in windows) as well as for Windows. The individual disks are listed as perfectly healthy and are (seemingly) operating normally, they just arent classified as a part of a RAID array.

I would like to RESET (not rebuild, if possible) the array by simply making the disks recognized as paart of a raid array with the appropiate characteristics, but without actually doing any of the “normal” setup (mainly because I dont want to lose the data).

I am fairly sure the data on the disks themselves are just fine. If nothing else, the partition srtucture seems to be valid. diskpart list drives 1-6 as unpartitioned and both drives 0 and 7 as a full-disk partition (using GPT). At first I thought this was strange, butthen I realized that since the strip-e size is 128 KB, each disk is writing 16 KB chuncks, and the entire partition info likely firs in the first 16KB (explaining disk 0). None of the the other disks have any non-zero bits, meaning the parity if just a copy of the 1 disk with data (explaining disk 7)

One possible complication: the drive was bitlocker encrypted, but was unlocked (and being written to) when the BSOD happened. Im not sure what this means in terms of its current encryption status.

Below this Ill post a good bit of technical info about the array andsystem and BSOD, but if ANYONE has a good idea on how I can reset an array to just “start getting recognized as part of a RAID array” again, id be immensely grateful.

Thanks.

MORE INFO ABOUT DISK SATUS AFTER BSOD:

* Device management shows 8 individual drives. Each is identical (TOSHIBA HDWN180 - 8TB N300 7200 RPM HDD DESIGNED FOR NAS USAGE). The old raid 5 volume still exists in the system, but is marked as “inactive / not currently connected”

* I took the disks offline to help preserve their current state. Currently, the first and last disk are listed as GPT and seem to have a valid partition structure. If either of these disks are taken online, the letter drive associated with the RAID array showsup in windows explorer, though if I try to access it I get a “you must format the disk before you can access this volume” notification. I believe that one of these represents the “real” GPT Boot record and the other is an exact copy in the corresponding parity block (assuming the other drives have all 0’s here, the parity block would just be an exact copy of the data block).

* TestDisk shows 8 individual drives. Though (unsurprisingly) TestDisk cant recover any info from them, since it isnt treating them as pieces of a RAID 5 array.

* Both TestDisk and DiskPart show the 1st and last disk in the array (disk 0 and disk 7 in diskpart) as being GPT (diskpart) or using a EFI/GPT partition table (testdisk). The other 6 disks are shown as not having any partition structure. All disks have been set to “offline” by diskpart. Interestingly, only one of disk 0 / disk 7 can be online at a time, and when either is online it things the volume has been reattached (but then shows it needs to be formatted)

* Diskpart still shows the RAID volume, but has the filesystem listed as “RAW”. The raid volume is still listed im device manager, but is offline. So, the system hasnt forgotten about the volume.

* All disks are listed as healthy. The only ususual (from my point of view) activity if on the last drive which got GPT attributes and cant be turned online currently. TestDisk seemed to be reading data from it just fine when analysize the disk for lost partitions (A process I stopped some time ago, but it still shows the disk in a working state).

ADDITIONAL INFO:

* RAID: 8 disk RAID 5 array implemented using Intel RSTe (v5.1.xxxx in the OpROM, v5.4.xxxx for drivers and the UI) (NOT Modded). Contains 8 identical Toshiba 8 TB N300 HDD’s. Array was fully initialized. The RAID 5 volume took up 100% of the available disk space. The RAID 5 array was only storing data - is wasnt being used as a boot drive.

* OS: Windows 10 Pro v1803. NOT using insider builds.

* WRITE BACK CACHING: In RSTe - Disk data cache and write back caching were both enabled. In Windows (disk properties → I/O policies): write back cache was enabled, and automatic buffer flushing was disabled.

* FILESYSTEM AND ALIGNMENT: The disks have a native 512-byte logical and 4096-byte physical sector size. The NTFS filesystem used a sector size of 16 KB. The RAID Stripe was 128 KB. The NTFS filesystem for the full volume was initialized using diskpart to setup the RAID 5 volume as GPT and write 2 partitions to it:

1) a “microsoft reserved” partition with an offset of 17 KB and size of (16 MB - 17 KB). diskpart shors this as beggining on sector 34 and ending on sector 32767 (each sector is 512 KB).

2) a “primary” partition with an offset of 16 MB (exactly…wmic lists is as an offset of 16777216 bytes) and a size that spans the remainder of the raid 5 volume.

(Side question: Im a bit iffy on aligning RAID arrays still, but this combination of offsets and sector sizes seems to make all the boundaries align in an ideal fashion? Boundaries within the microsoft reserved partition should still be ligned, since the offset is 1 NTFS sector (16 Kb), and ensures that each stripe contains 8x16KB blocks, which means each individual drive contributes 1 ntfs sector to each data stripe written. Am I correct in thinking this setup is (close to) optimal?

* ENCRYPTION: The drive data was encrypted using bitlocker encryption (256-bit AES-XTS). It was in use during the BSOD, which maybe means it wasnt actively encrypted? I do have the recovery key.

ERROR INFO ABOUT THE BSOD:

I have a memory dump from this BSOD. It isnt a fully complete one, but I think it contains the full kernal memory space and part of the user memory space.

According to winDbg Preview, I get the following:

nt!KeBugCheckEx:
fffff802fa42b330 48894c2408 mov qword ptr [rsp+8],rcx ss:0018:ffff8c00c287db80=0000000000000133

DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL or above. Arguments:
Arg1: 0000000000000000, A single DPC or ISR exceeded its time allotment.
Arg2: 0000000000000501, The DPC time count (in ticks).
Arg3: 0000000000000500, The DPC time allotment (in ticks).
Arg4: fffff802fa6ed378, cast to nt!DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK, which contains additional information regarding this single DPC timeout.

MODULE_NAME: Unknown_Module
IMAGE_NAME: Unknown_Image

Followup: MachineOwner
*** Memory manager detected 2 instance(s) of page corruption, target is likely to have memory corruption.

EDIT by Fernando: Title and content of the post shortened (for better readability and to save space)

Fernando · June 23, 2018, 9:56am

@jkool702 :
Welcome to the Win-RAID Forum!

Before I will try to help you, I need some additional information:
1. Which chipset has your system?
2. Which OS are you running?
3. Is the system drive C: inside or outside the RAID array?

Regards
Dieter (alias Fernando)