X99 ECC support

While I dont think they are quite at the level of the Tier 1s ( Asus, Gigabyte, MSI, ASRock) who have released hundreds if not thousands of motherboards.

For having only released to consumers less than 10 boards, the quality of this board is spectacular. I love the ATX-24 pin and 8-pin layout and DR DEBUG which is not covered up by the GPU. The boot speed is very fast. Aside from the ECC issue, everything else works. Once that is resolved, it is a perfect board for me.

Also how many Tier 1s will provide support and a new BIOS over the weekend? Without expensive support contracts? I canā€™t think of any. Though I have had good experience with ASRock Rack service and anyone you pay nicely for the support contract.

Plus, I even got macOS 10.14.5 running on this board fairly quickly so the quality of the ACPI implementation is good.

Here is the prototype case I built for this system : )

IMG_9378-cpu.JPG

IMG_9382-back.JPG

IMG_9385-gpu.JPG

See my edit above

Nice build and layout. That material looks thick, is it heavy?

@Lost_N_BIOS Here is the latest BIOS - it was a full flash including NVRAM with

1
Ā 
AfuEfix64.efi KLX99.BIN /P /B/ /N /X
Ā 


If this does not include everything, let me know and I will get a flash dump with SPI programmer. This board has a SOIC-8 chip and I only have SOIC-16 clip so it will take time to wire it up manually.

KLX99.BIN.zip (4.55 MB)

Thank you. I did not see it!

I replied and attached the latest BIOS, if possible lets move those posts here to keep everything related to this board in a single thread.

It is quite light! The case is <3kg and is made of 3mm aluminum angle. The beams are hollow, but can withstand 60 kg before deforming. The heaviest component is the water in cooling loop

@e97 - Why no /K?

I want an actual dump from your current BIOS on board, this is only way to fully edit the NVRAM, and yes, if you have flash programmer that would be preferred as it will be much easier for you to program back in after I edit.
Or, you could, if needed, unlock BIOS and or SMI/SMM lock, if they are enabled, but we canā€™t know until you do the above (if Intel).
If AMD, then programmer only would really be best, since itā€™s a pain in the arse to properly and fully flash in a mod BIOS with all regions I would be editing, and it makes a mess using /GAN with these kinds of edits due to multiple /GAN flashes required.

I thought maybe you missed that. Do you want me to move all your posts and my replies from that thread over to this one?

It looks heaver than it is then, I know what you mean, some loops and rads can get quite heavy!

@Lost_N_BIOS

Its Intel system. ā€œunlock BIOS and or SMI/SMM lockā€ both of those options are available in the BIOS.

That was the command mfg specified to use with the BIOS.

Iā€™ll be back later with a full flash dump. Thank you.

Yes, it would be nice so I dont have to watch two threads and miss a post. I only posted in the other thread becuase it sounded like the OP had a similar problem and was hoping they found a solution.

Great you can see those options in BIOS, disable them, then you can reflash BIOS region without issue using FPT. But, since you have a programmer, best to use that so we can get both Main NVRAM volumes, one is backup so AFU and FPT neither dump those out.
I prefer to edit both when possible in case main is switched for backup this way both have the same edit

I thought maybe you choose to not use /K. I always use it, unless it gives you an error, some BIOS do not have this region, but if you use it and donā€™t get error, then itā€™s there and all flashes you did previously without it didnā€™t flash this (non-critical regions, unsure what all this covers)
When using or suggesting use of AFU, I always use these >> /P /B/ /N /K at the very least

Do you want me to just merge the threads? Maybe that would be best, for raun0 and anyone else already following that thread.

Good idea! they have the same title after all

Merged, yes that same title was confusing for a second, even when I went to merge I had to go back and look twice to see which was which so I didnā€™t try to merge same into same

Surprisingly, the SOIC-8 was easier than a SOIC-16ā€¦ since the programmer was designed for SOIC-8 while the clip is SOIC-16ā€¦ either way a few jumper wires to the headers and now its dual use!

The hardest part was finding the right drivers for CH341a ā€¦

mfg site: http://www.wch.cn/download/CH341SER_MAC_ZIP.html for Mac, Windows, and Linux drivers. Schematics can also be found there. For linux, if on a kernel >= 4.11 , this may help: https://github.com/juliagoda/CH341SER

On macOS 10.14.6 the latest driver CH341SER_MAC_v1.5.ZIP is signed but didnā€™t show any devices under /dev/* but flashrom didnā€™t show programmer errors, only couldnt find the flash ic.

So tried a linux sytem. First system, Ubuntu 18.04.2 LTS, kernel 4.15.0-55, didnā€™t like the USB extension cable on the front ports. Even a direct connection didnt work reliably.

Went to another linux system, debian 9 with older kernel 4.9.0-8 and it worked worked without issues, even with the same extension cable.


Anyway, hereā€™s the full dump. Double checked it was the same from two separate reads.

X99_bak_spi.bin.zip (4.7 MB)

@e97 - Sorry I didnā€™t link you to main driver and general software package, here for next time (windows) - http://s000.tinyupload.com/index.php?filā€¦257455007472602

Here, please test these BIOS, in the following order, stop once ECC works with memory (and let me know which it was so I can make a note). Ignore what you see set in BIOS, some will show auto, some will show enabled, some Iā€™m unsure what will show
For all these tests, do not adjust the setting in BIOS at all Simply load optimal defaults, reboot back to BIOS and make any other changes you need, then test ECC memory functionality.
If no changes, onto next BIOS. If none work, there is some issue in their BIOS, or some memory compatibility issue, or something missing on the board itself.

1. BCPO.bin (Will show enabled)
2. PlatO.bin (Will show auto)
3. NVRAMO.bin (unsure)
4. BCPNV.bin (Will show enabled)
5. PlatNV.bin (unsure, depends on #3 above)
6. PlatBCP.bin (Will show enabled)
7. PlatBCPNV.bin (Will show enabled)

http://www.filedropper.com/e97-spi-ecc-test-x7



I lost my hope, but some day I will continue the prosess. If you are going to develope coreboot for you x58 board I will follow you with great interest.

@Lost_N_BIOS it was a good to learn my tools and that ZIP doesnt have mac or linux drivers.

Tried all of them. No luck.

Here was the process:

1) Flash BIOS via SPI programmer (Read. Write. Verify.)
2) Power down by turn off PSU.
3) Power up and boot.
4) Check ECC status with memtest86.
5) Reboot
6) Reset BIOS to Optimized defaults (system powers down and reboots)
7) Check ECC status with memtest86.
Go to #1

Iā€™m guessing some ECC traces are missing or not properly routed. One strange thing, memtest86 detects the memory controllers:

1
2
3
4
Ā 
2019-09-06 15:27:25 - find_mem_controller - Intel Haswell-E (IMC 0) (8086:2FA8) at 255-19-0
2019-09-06 15:27:25 - find_mem_controller - Intel Haswell-E (IMC 0) ECC mode: detect: no, correct: no, scrub: no, chipkill: no
2019-09-06 15:27:26 - find_mem_controller - Intel Haswell-E (IMC 1) (8086:2F68) at 255-22-0
2019-09-06 15:27:26 - find_mem_controller - Intel Haswell-E (IMC 1) ECC mode: detect: no, correct: no, scrub: no, chipkill: no
Ā 



but linux says they are not found

1
2
3
4
5
6
7
8
9
10
11
12
13
Ā 

[ 2.802091] EDAC MC: Ver: 3.0.0
...
[ 36.071337] EDAC sbridge: Seeking for: PCI ID 8086:2fa8
[ 36.071341] EDAC sbridge: Seeking for: PCI ID 8086:2fa8
...
[ 36.071391] EDAC sbridge: Seeking for: PCI ID 8086:2f68
[ 36.071397] EDAC sbridge: Seeking for: PCI ID 8086:2f68
...
[ 36.071525] EDAC sbridge: CPU SrcID #0, Ha #0, Channel #0 has DIMMs, but ECC is disabled
[ 36.071555] EDAC sbridge: Couldn't find mci handler
[ 36.071570] EDAC sbridge: Couldn't find mci handler
[ 36.071586] EDAC sbridge: Failed to register device with error -19.
Ā 


Perhaps linux will only detect and register if ECC traces are propely connected.

My research says connecting the memory traces is already tricky due to high speed signaling. ECC even more so since the trace density increases and also has additional interference. The "extra" ECC traces only used on server/enterprise boards are unlikely to be routed since this is a gaming motherboard.. or may have been routed but not tested, thus the problems with ECC.

I'm likely going to give up on this motherboard. I have a previous generation C602 with ECC verified working so I will use that and upgrade later when DDR4 is cheaper.
1 Like

@e97 - what do you mean about ZIp and MAc/Linux? Do you mean about the file I uploaded? If yes, that is not zip, itā€™s 7zip, sorry for not considering this.

You did step #1 wrong, this can mess up and and all BIOS programmings. #1 should be >> Erase, then blank check, then open file (read to you maybe) then write/verify.
If you do not erase, and then make sure chip is blank, some data left in place can mess up written outcome. It may not break the BIOS, but it could leave data in there and mess up some random thing, so always erase and then if possible blank check.
If it was me, I would redo all tests, making sure to erase first and blank check too if possible in whatever you are using (This is possible in windows apps) And then make this a habit/SOP

Sorry, I donā€™t know anything about this stuff, but the above info is something you should pass along to JINGSHA in case it will help them diagnose anything here.

I would assume OS wouldnā€™t matter, but you never know? Set up a windows install and see if anything changes.

From what you mentioned, and how quickly they sent you BIOS, I assumed they tested already and confirmed working on their end, didnā€™t they say that?

@Lost_N_Bios ah 7zip, yea no problem, all the same to the extractor tool : )

flashrom -w reads, erases, writes, verifies. forgot to mention the erase part.

While talking to their engineers, I ended up loading windows. Luckily I had an old ssd with it and it started up fine. It looks like theres some configuration issues with the RAM and the ECC lines may be connected after all as it shows a 72-bit total data width. The two imc (integrated memory controllers) look to be running in dual channel with 2 DIMMs each when they should be running quad channel with 2 DIMMs each as I have 8 DIMMs populated. I sent them a bunch of debug info and a way to verify ECC ( AIDA64 and https://forums.servethehome.com/index.phā€¦e-windows.4087/ ). They said theyā€™d will get back to me in two days with a new BIOSā€¦ hopefully this time it will work!

@raun0 I think coreboot has some support for X58 chipsets alreadyā€¦

Great, great, and great!

Hopefully they sort this out for you! So, in windows, you could better tell, it was almost working?

Win has a lot of nice GUI system info tools: CPUz, AIDA64, etcā€¦

linux has similar info in text form: dmidecode, lspci, lsusb, etcā€¦

I had a feeling it was an incorrect initialization issue and it was confirmed by similar info being strange in two difference OSes. The vendor is familiar with Windows as are most motherboard mfgs ā€“ nearly all the tools are written for Winā€¦ it was easier to find the issue with win tools so they could verify and also have something to test against.

Last time they tested it, I think they checked the RAM SPD info. which shows ECC and mistook that for having it enabled and working. I sent them memtest86 which was helpful in confirming the same issue and not dependent on an OS. Once we got Win debug info and tools it made much more sense and I think they have a good plan to get it working since now they can easily see if ECC is being utilized by the CPU and mainboard.

A good way to verify ECC is working properly is to use memtest86 pro and the ECC injection features of the platform to inject errors and check if they are caught and corrected. The best way is an out of band hardware ECC verification tool, basically a DIMM module hooked up to a controller where you can tweak various parameters like voltages and speeds and introduce all kinds of timing issues and errorsā€¦ but thats ridiculously expensive and complicated.