X99 ECC support

Nice job.

Linux gives these errors. It has been said that is common with X79 and X99 systems. So is there something missing?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Ā 
 dmesg |grep EDAC
[ 8.584543] EDAC MC: Ver: 3.0.0
[ 11.885601] EDAC sbridge: Seeking for: PCI ID 8086:2fa0
[ 11.885609] EDAC sbridge: Seeking for: PCI ID 8086:2fa0
[ 11.885613] EDAC sbridge: Seeking for: PCI ID 8086:2f60
[ 11.885616] EDAC sbridge: Seeking for: PCI ID 8086:2f60
[ 11.885619] EDAC sbridge: Seeking for: PCI ID 8086:2fa8
[ 11.885623] EDAC sbridge: Seeking for: PCI ID 8086:2fa8
[ 11.885626] EDAC sbridge: Seeking for: PCI ID 8086:2f71
[ 11.885629] EDAC sbridge: Seeking for: PCI ID 8086:2f71
[ 11.885632] EDAC sbridge: Seeking for: PCI ID 8086:2faa
[ 11.885635] EDAC sbridge: Seeking for: PCI ID 8086:2faa
[ 11.885638] EDAC sbridge: Seeking for: PCI ID 8086:2fab
[ 11.885642] EDAC sbridge: Seeking for: PCI ID 8086:2fab
[ 11.885645] EDAC sbridge: Seeking for: PCI ID 8086:2fac
[ 11.885649] EDAC sbridge: Seeking for: PCI ID 8086:2fad
[ 11.885654] EDAC sbridge: Seeking for: PCI ID 8086:2f68
[ 11.885658] EDAC sbridge: Seeking for: PCI ID 8086:2f68
[ 11.885660] EDAC sbridge: Seeking for: PCI ID 8086:2f79
[ 11.885664] EDAC sbridge: Seeking for: PCI ID 8086:2f79
[ 11.885667] EDAC sbridge: Seeking for: PCI ID 8086:2f6a
[ 11.885671] EDAC sbridge: Seeking for: PCI ID 8086:2f6a
[ 11.885673] EDAC sbridge: Seeking for: PCI ID 8086:2f6b
[ 11.885677] EDAC sbridge: Seeking for: PCI ID 8086:2f6b
[ 11.885679] EDAC sbridge: Seeking for: PCI ID 8086:2f6c
[ 11.885684] EDAC sbridge: Seeking for: PCI ID 8086:2f6d
[ 11.885688] EDAC sbridge: Seeking for: PCI ID 8086:2ffc
[ 11.885691] EDAC sbridge: Seeking for: PCI ID 8086:2ffc
[ 11.885695] EDAC sbridge: Seeking for: PCI ID 8086:2ffd
[ 11.885698] EDAC sbridge: Seeking for: PCI ID 8086:2ffd
[ 11.885701] EDAC sbridge: Seeking for: PCI ID 8086:2fbd
[ 11.885705] EDAC sbridge: Seeking for: PCI ID 8086:2fbd
[ 11.885707] EDAC sbridge: Seeking for: PCI ID 8086:2fbf
[ 11.885711] EDAC sbridge: Seeking for: PCI ID 8086:2fbf
[ 11.885718] EDAC sbridge: Seeking for: PCI ID 8086:2fb9
[ 11.885723] EDAC sbridge: Seeking for: PCI ID 8086:2fb9
[ 11.885725] EDAC sbridge: Seeking for: PCI ID 8086:2fbb
[ 11.885730] EDAC sbridge: Seeking for: PCI ID 8086:2fbb
[ 11.885760] EDAC sbridge: CPU SrcID #0, Ha #0, Channel #0 has DIMMs, but ECC is disabled
[ 11.885766] EDAC sbridge: Couldn't find mci handler
[ 11.885768] EDAC sbridge: Couldn't find mci handler
[ 11.885770] EDAC sbridge: Failed to register device with error -19.
Ā 

Hello!

Iā€™m using a JINGSHA X99 board with Xeon E5-2678 v3 and 16GB PC3-14900R RDIMM.

CPU and X99 support ECC. linux says its is disabled.

BIOS is raw / debug version, Iā€™m not seeing any ECC enable options - did I miss it?

Iā€™ve attached the bios, @Lost_N_BIOS mind taking a look?

Does it meet the requirements for ECC? Iā€™ll check the memory traces tomorrowā€¦

X99_BIOS.zip (4.84 MB)

@e97 - All I can do is check the BIOS settings, and in most BIOS Iā€™ve seen that are ECC compatible only some have an ECC setting (or two), some have none and work fine with ECC (So could just be those sticks you are trying are not compatible, if you think it should be ECC enabled)

I found the usual setting Iā€™ve seen @ >> IntelRCSetup >> Memory Configuration >> ECC Support (Default = Auto)
TONS of settings there, and Iā€™m not familiar with ECC, so many, or none may also apply

@Lost_N_BIOS thank you! I missed that, will try to set to Enabled and see what happens.

I verified the ECC RDIMMs have ECC enabled and working in another system. This leaves me to think its either a BIOS setting, BIOS feature or hardware traces for ECC are not connected.

Youā€™re welcome! Another system is not this system, unless you meant same exact model?
Not all memory is compatible with all boards, BIOS, chipsets etc - thatā€™s all I meant, maybe this particular set of memory is incompatible with one of those things, try another different ECC stick and see if always same fails or not.

Thats a good idea. I will try some small / slower ECC modules and see if they work.

I donā€™t see any ā€œECC Supportā€ option in IntelRCSetup >> Memory Configuration >> ECC Support (Default = Auto)

Is it named something different or hidden? How did you find it?

Thatā€™s the exact name of the setting. Sorry, I thought you were looking in AMIBCP when you mentioned digging aroundā€¦
MANY settings may be hidden from you, you will either need mod BIOS to make them visible, or directly change in AMIBCP then reflash.

ECC.png



If you want me to make it visible to you, tell me how far into that string of submenus can you see in your current BIOS?
I mean, can you see IntelRCSetup, if yes, can you see IntelRCSetup >> Memory Config submenu? If yes, but no ECC Support in there, then it will need to be made visible, or changed directly and left hidden (up to you)

On your BIOS Main page, can you see "Access Level" if yes, what does it say, User or Admin/Supervisor?

No worries, I should have mentioned I hadnā€™t modified the BIOS yet :wink:

Access level: Administrator

thumb_IMG_0224_1024.jpg



Here are the Memory Configuration screens:

thumb_IMG_0225_1024.jpg



thumb_IMG_0226_1024.jpg



thumb_IMG_0227_1024.jpg



thumb_IMG_0228_1024.jpg



So using AMIBCP Iā€™ll be able to modify / unhide all the options needed?

update:

mfg got back to me with a new BIOS with ā€œEnable ECC Supportā€ option available but still same with ā€œAutoā€ or ā€œEnableā€ also tried the 8GB sticks and same issue.

One thing I did notice if using less than 8 stick, the channels overlap. Iā€™m thinking its a bug in the initialization.

eg:
Working ECC system, X79
mc0: csrow0: CPU_SrcID#0_Ha#0_Chan#0_DIMM#0: 0 Corrected Errors
mc0: csrow0: CPU_SrcID#0_Ha#0_Chan#1_DIMM#0: 0 Corrected Errors
mc0: csrow0: CPU_SrcID#0_Ha#0_Chan#2_DIMM#0: 0 Corrected Errors
mc0: csrow0: CPU_SrcID#0_Ha#0_Chan#3_DIMM#0: 0 Corrected Errors
mc0: csrow1: CPU_SrcID#0_Ha#0_Chan#0_DIMM#1: 0 Corrected Errors
mc0: csrow1: CPU_SrcID#0_Ha#0_Chan#1_DIMM#1: 0 Corrected Errors
mc0: csrow1: CPU_SrcID#0_Ha#0_Chan#2_DIMM#1: 0 Corrected Errors
mc0: csrow1: CPU_SrcID#0_Ha#0_Chan#3_DIMM#1: 0 Corrected Errors

this system:
with 8 DIMMs = OK
CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
CPU_SrcID#0_Ha#0_Chan#2_DIMM#0
CPU_SrcID#0_Ha#0_Chan#3_DIMM#0
CPU_SrcID#0_Ha#0_Chan#0_DIMM#1
CPU_SrcID#0_Ha#0_Chan#1_DIMM#1
CPU_SrcID#0_Ha#0_Chan#2_DIMM#1
CPU_SrcID#0_Ha#0_Chan#3_DIMM#1


with 2 DIMMs = OK
CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
CPU_SrcID#0_Ha#0_Chan#1_DIMM#0


with 4 DIMMs = overlapping ??
CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
CPU_SrcID#0_Ha#0_Chan#1_DIMM#0

1 Like

@raun0 Iā€™m running into a similar issue, did you ever get ECC working?

Great, they unhid the option for you! Is this in the same location I mentioned it would be located? If not, send me that BIOS, and let me look, unhide the one Iā€™m talking about, maybe it could help?
Or, still could just be this particular memory isnā€™t playing nice with this board/BIOS.

@raun0 @e97 - Did you guys change the ECC Support Setting from Auto to enabled? If not, that may help, auto may be setting disabled.

Yep. Exactly where you said it would be.

I donā€™t think its a module compatibility issue since they work fine, only without ECC enabled. Iā€™ve tried multiple other ECC modules that have verified to work in the previous generation X79/C602 and also same generation X99/C612 systems.

Their engineers said theyā€™ve confirmed it to be working so its likely a misconfiguration on my part or they are still working out the memory timings / settings. Iā€™ll know for sure in a few days!

Also this post was helpful in verifying physical trace connection: https://www.bios-mods.com/forum/Thread-Rā€¦e-P6T-Deluxe-V2

RDIMM datasheet = ECC pins on RAM modules
DIMM connector datasheet = pins to motherboard connector / layout
Xeon datasheet = ECC pins on CPU
Xeon / Socket Mechanical Guide = CPU pins to motherboard socket / layout

** Be careful as the pins are delicate!!

After finding the pins, use a multi-meter with ohm/resistance capability to check if they are connected!

Be careful as the pins are delicate!! ** (x2 because its important)

My alternative is to get an open source BIOS implementation like CoreBoot working on the boardā€¦ it will be an adventurous challenge!

Maybe something in the ECC chip itself may not be compatible. So youā€™ve now tried others sets as well, none work? Have you tried all these with only 1-2 sticks at a time?

Is your CPU microcode up to date/latest? If not, maybe that could be causing some issue too, at least itā€™s something you could try updating to see if it helps, while you wait on them to reply. Itā€™s cool they replied to you quickly and sent out new BIOS

Their support is great! Its a recently released motherboard mainly for gamers, hence the LEDsā€¦ My use case, professional/workstation use, is not what they typically see so Iā€™m happy to work with them to get everything working as the board has great features.

If the modules were the problem, at least the ECC memory controller in edac-utils would be recognized but its not showing up.

1
2
3
4
5
6
7
Ā 
[    2.802091] EDAC MC: Ver: 3.0.0
...
[ 36.071500] EDAC sbridge: Seeking for: PCI ID 8086:2fbb
[ 36.071525] EDAC sbridge: CPU SrcID #0, Ha #0, Channel #0 has DIMMs, but ECC is disabled
[ 36.071555] EDAC sbridge: Couldn't find mci handler
[ 36.071570] EDAC sbridge: Couldn't find mci handler
[ 36.071586] EDAC sbridge: Failed to register device with error -19.
Ā 


Also in dmidecode on 4 of 8 modules show up, but system memory shows the correct amount.

I tried many configurations and documented the results: 1 stick, 2 sticks, 4 sticks in various sockets and 8 sticks.

Tried setting to "Enable". No luck.

I think its NVRAM, MSR or other conflicting setting or lack of proper initialization. Its also possible the traces are not connected so that should be verified, but in my case they are connected.

Thatā€™s nice to hear, Iā€™ve only seen them mentioned a few times, and I thought they were like some small, cheaper, OEM type that made generic boards.

I know nothing about what you mention above, do you mean that even if the ECC control chip on the memory stick was bad or not compatible, youā€™d still see something you are missing above?
If yes, then sounds like they need to look into this more for you, especially since they are confirming working and OK on their end.

@e97 Dump your current BIOS for me, since you mentioned NVRAM, yes it could be locked disabled or auto in there still, especially if flashing commands didnā€™t destroy and rebuild new NVRAM when you flashed in the new BIOS.
I will make you mod BIOS, with it enabled everywhere Usually, there is three main locations this can be set as a BIOS setting, in setup module, in AMITSE/SetupData (This is what AMIBCP Changes when you change a default setting value), and in NVRAM.
In NVRAM, there is often two main volumes, especially if you can dump with programmer, and then there is a third and sometimes 4th in the internal BIOS volumes too.

If this is Intel System, dump with FPT. If itā€™s AMD, there may be issues, unless you have a flash programmer, or already know what AFU can flash in MOD BIOS on this system

If Intel, hereā€™s how in case you donā€™t know - Check BIOS main page and see if ME FW version is shown, if not then download HWINFO64 and on the large window on left side, expand motherboard and find ME area, inside that get the ME Firmware version.
Once you have that, go to this thread and in the section ā€œCā€ download the matching ME System Tools Package (ie if ME FW version = 10.x get V10 package, if 9.0-9.1 get V9.1 package, if 9.5 or above get V9.5 package etc)
Intel Management Engine: Drivers, Firmware & System Tools

Once downloaded, inside you will find Flash Programming Tool folder, and inside that a Windows or Win/Win32 folder. Select that Win folder, hold shift and press right click, choose open command window here (Not power shell).
At the command prompt type the following command and send me the created file to modify >> FPTw.exe -bios -d biosreg.bin

Right after you do that, try to write back the BIOS Region dump and see if you get any error, if you do show me image of the command entered and the error given >> FPTw.exe -bios -f biosreg.bin

If you are stuck on Win10 and cannot easily get command prompt, and method I mentioned above does not work for you, here is some links that should help
Or, copy all contents from the Flash Programming Tool \ DOS folder to the root of a USB Bootable disk and do the dump from DOS (FPT.exe -bios -d biosreg.bin)
https://www.windowscentral.com/how-add-cā€¦creators-update
https://www.windowscentral.com/add-open-ā€¦menu-windows-10
https://www.laptopmag.com/articles/open-ā€¦ator-privileges

This is the board:

image_2019_07_15T02_17_18_068Z.jpg



as far as I know the only LGA2011-V3 that supports DDR3 !

Large or small I dont know. I do know they sell quite a few boards and also sell them to OEMs.

For ECC, modern x86_64 CPUs (made in the last decade) have the memory controller integrated into the CPU itself. The ECC memory is "dumb" in the sense that it has an extra chip and hardware to calculate a checksum for the other 8 bits of data but that is also treated as data. Hence non-ECC is 64-bit data and ECC is 72-bit data.

These modules can correct a single bit error and detect but not correct multi-bit errors. They will show this kind of information:

1
Ā 
Error Correction Type: Single-bit ECC
Ā 



There are more advanced ECC modules like these HP modules that also have a "controller" on the RAM itself and can detect and correct multi-bit errors. Many vendors have this multi-bit ECC under various trade names. They show this kind of info:

1
Ā 
Error Correction Type: Multi-bit ECC
Ā 



I've read there is/was a generation of ECC modules that had the controller on the RAM itself to work with non-ECC CPUs/motherboard but I've not come across these personally.

Looks like a fairly decent board, not what I was thinking when I see this name. Iā€™ve only seen them mentioned a few times though, I think I modified a X89 for someone not long ago and we discussed how that was an odd chipset name

Be sure to see my reply to you here - X99 ECC support (2)