RAID1 Mirror Corruption on 2008 R2 Server with Intel RSTe Controller with Intel SSD Drives

Stephen and I are both on IRSTE Sata Oroms, versions 3.8. The 11 12 13 series belong to consumer grade chipsets, not enterprise server boards.

Stephen,
2nd stress test of 4 days also was successful, not one parity error. I did a 3rd test, I pushed the reset button while the server was running, again not 1 parity error. I am concluding that the problem was definitely the enabling of trim for raid 1. It is just not reliable under high load conditions. I am now stress testing HDDs for the next 4 days with the 3.6 drivers just to be sure I will not have any problems after replacing the drivers on the main server.
Thanks

Nevertheless the X79 modded Intel RST(e) ROM module v12.9.0.2006 will work for your server systems, because they fully support DEV_2826 Intel SATA RAID Controllers…

Maybe so but it woudln’t support the SAS firmware. Supermicro X9DAi motherboard has two Sata chipsets, one sata and one SAS. I could never use a modded bios to flash the firmware because this is deployed in a business, I would be liable if something were to go wrong. I will say though that it appears that 3.6 drivers from Intel’s download center appear stable. It seems to be the enabling of trim on raid 1 that causes issues.

If your speculation should be the real thruth, it would explain the problems Intel had to get the "TRIM in RAID0" feature into the RSTe platform.

Fernando,
On my main system at my office, I have a Rampage IV Extreme. It has the X79 chipset with the dual Sata Orom option in the bios. I have been running twin samsung 840 pros in Raid 0 for at least 2 years and have used and updated the drivers each time I see a new version released. I am talking about the 3 and 4 series Enterprise drivers. I have always left the dual sata option on IRSTe. I have NEVER had any issues with any of the drivers. With the server 2008r2 which is windows 7 based, I have had every possible problem with the same drivers in a raid 1 volume. The only way it seems to work is when trim is not available. I believe it is disabled on all drivers on Win 8/Server 2012r2 (at least so I remember reading). I can only say the very same Intel ssds in raid 1 on windows8 13.6 drivers have NEVER caused any issues. NEVER even a single parity error. Trim is disabled on each and every one of those systems. Haven’t tried with Server 2012r2 but I am about to.

Thank you all for your information shared until now!
Maybe later on, maybe I will follow your hint and update the opRom, but until now I did not found a download source @ intel!?
Why I use the consumer branch drivers: because these drivers are tested by "Thomas Krenn" - my servers vendor… and the opRom is the original one…

Some thoughts to the theory that the phenomenom has to do with TRIM:
I tried TrimCheck on both my workstation (Win8.1pro, also uses SSD, but as normal disk - not in RAID) and the servers. TrimCheck told me that TRIM works at my workstation but not at the servers (although "DisableDeleteNotify = 0" is given out at the servers…

The next thing I will check is to set "DisableDeleteNotify" to see if this has any influence.
Tests are running now…

I’m experiencing the same problem as described in this topic on an ASUS RS100-E8/PI2 server with 2 SSDs in RAID1. The array is working fine, but I get 65535 verification errors in the Intel RSTe Console (the console stops counting at 65535). I haven’t stress tested the array as you did, however I expect it to behave the same way as yours and fail later on.

I’m using Windows Server 2012 R2 Standard and TRIM is enabled by default for the SSD array (DisableDeleteNotify == 0), however TRIMCheck reports that TRIM isn’t working.

I’ve switched from SSDs to HDDs and haven’t gotten any verification errors since, so I too suspect that TRIM is having problems in RAID1.

Intel has removed the verification errors report from the newest RSTe console (4.2), however, they are still there if I switch back to 4.1. I think Intel is aware of the issue, but why their “solution” was to hide the report rather than to fix it?

Just as a hint and attempt to sum up possible reasons for the seen faults:

I feel we must differ between some possible reasons for the seen RAID 1 verification faults:

1) driver misbehavior (maybe as postulated here in the context of OS and TRIM)
2) potential sector level discrepancies due to SSD garbage collection as mentioned before
3) real existing problems like bad SATA cabeling

To exclude reason 3), please have a look to the SMART data of your drives - especially parameter C7 (CRC faults between controller and drive).

On my board with hdds in raid 1 and the 4.2 drivers, stress testing caused the volume to fail. I would not use that driver at all. On your system server 2012r2, does the 3.6 driver install?

I am sure that there are many reasons why people can get parity errors on RAID sets. But if the conditions are not the same, we cannot prove anything to Intel.

I created this forum thread to discuss a specific problem with Intel RSTe drivers using approved drives, in the hope that Intel would (probably silently) read the information here on what is a very clear cut problem.

From what I can tell from your posts, your motherboard is a consumer motherboard and your SSD drives are not identical pairs of approved drives. I would really appreciate it if we could stick to the original thread topic I created, so that this becomes a list of people all with the same irrefutable problem to present to Intel.

If you read back through the posts, you’ll see that we have already eliminated SATa cabling and possible individual drive faults from the possible list of causes of the specific issue we are discussing.

Best regards

Stephen Done

These "Enterprise" RAID drivers are the ones I was talking about as well.
Intel has developed and released extremely good RST RAID drivers (much better than the AMD, Marvell and ASMedia ones), but until now they haven’t succeeded with the production of RSTe RAID drivers, which are
a) running as stable and
b) supporting the same features
as the "normal" RST RAID drivers.
What I never understood and still don’t understand is the fact, that Intel didn’t simply add the DeviceID DEV_2826 of the Intel C600/C600+ Chipset Series SATA RAID Controllers to the iaStorAC.inf file entries of their recently released RST drivers. These Intel RST RAID drivers v12/v13 series are running fine with C600/C600+ Chipset Series SATA RAID Controllers (after having added the missing HardwareIDs to the relatred iaStorAC.inf files).
By the way: In March 2013 I have started >this< thread at the Intel Communities Forum, but haven’t gotten a satisfying answer by Intel.


@ MarkoD:
Welcome at Win-RAID Forum abnd thanks for your report!

It is hard to believe, but seems to be true - Intel is obviously not able to solve their problems with the RSTe RAID drivers from v3/v4 series.

Regards
Dieter (alias Fernando)

@StephenDone
Sorry, I did not want to spread spam into this thread!
As I told before, I was very lucky to find this thread, because I was not able to find any other information on the net concerning this specific RAID 1 problem.
Please be sure that I studied all posts in this thread very carefully.
The hint with SATA cabling was meant for MarkoD, because of the real big counters mentioned.
Explanations for the different drives on each RAID 1: I just followed the hints on the net and also my private experience, that one should diversificate the type of drive to not run in danger, that e. g. the whole RAID will be destroyed because of one drive specific fault (firmware, …).
I am convinced, that the reason for the shown symptoms is the same, no matter if we are using the enterprise or the consumer variant and therefor wanted to work together with the other thread followers on reproducable results/solutions.
If you feal I am disturbing your original thread goals I beg you pardon - I will stop dropping my questions and test results here…
No offence meant!

No problem. Please just bear in mind the title of the post!
Let’s hope Intel see all these problems that we are having and acknowledge them.
Best regards
Steve

Stephen,
Just to let you know, I currently have the 3.6 drivers (same as you) installed and am testing with Hdds in raid 1 just to be sure that I will be stable with the raid 10 4 drive hdd volume that I have installed on the SAS chipset on the same server. The Ssds are definitely stable after 2 tests and a push of the reset button. What I am concerned about right now is what will happen when I go to server 2012r2 later this year.

Steve,
I just finished my stress tests with ‘fsutil behavior set DisableDeleteNotify 1’ but the result is the same (verification faults).
Can anyone confirm, that disabling TRIM on this way cured the problem when using SSDs?
Best regards

Hi Bibo,

I think BillyDV did this test…

Have a look at page 4/5 of posts.
Perhaps confirm with BillyDV.

Cheers
Steve

EDIT by Fernando: Removed some blank lines (to save space)

I have rebuilt my RAID1 array on Windows Server 2012 R2 from SSDs to HDDs and:

1) The data on the SSDs are okay, despite reporting 65535+ verification errors (I haven’t stress tested the array as Stephen and billydv did, so it didn’t get corrupted yet)
2) There are no verification errors (for the same data) on the HDDs

I’m using the 4.1 OROM, drivers and RSTe console. So my conclusion is that SSDs (presumably TRIM) is what is causing the problems.

Thanks Steve,
I read this post but I am not sure if BillyDV really tested and therefor proofed it.
@billydv : Did you test this case (RAID1 SSD + TRIM switched off using Server 2008R2 and no further data corruption) or was it (just) an assumption?
Best regards
Juergen

Hi Bibo,
Here are the tests I have run. Stress tests are done with Passmark Burn In software for approximately 4 days per test result. Cpu, Memory is set to about 50% duty, disks are set to 50-60% duty. Server is running 2008r2, os on a 2 drive raid 1 volume, sas chipset has a 4 volume raid 10 made up of hdds. Sata orom on X9DAi is 3.8

1- IRSTe 3.8, 3.9, 4.1, 4.2 with raid 1 ssds fails

2- IRSTe 3.8, 3.9, 4.1 with raid 1 hdds succeeds

3- IRSTe 4.2 with raid 1 hdds fails

4- IRSTe 3.6 with raid 1 ssds succeeds (I ran this twice to confirm my results, each time resulted in no parity errors. A 3rd test was to push the reset button on the server while it was running, Server restarted normal and parity verification reported no errors)

5- IRSTe 3.6 with raid 1 hdds currently testing, should have results on friday.

The difference between the 3.6 drivers and all the rest is that raid 1 has no trim enabled. I have not tried the other drivers with trim disabled, none of my tests have been done with any changes to fsutil behavior.

I have been told that the 3.6 drivers are not at all compatible with server 2012r2. I will soon be testing the 3.8 series drivers with trim disabled in the os by running this command "fsutil behavior set disabledeletenotify 1". I will report back.