[REQUEST] How to connect 2 M.2 SSDs to the same PCIe slot?

Thanks, but I already have one!
I can’t recall why, but I have one

Can you tell me how I upgrade the Intel ME firmware of the Z9 board? Same as with bios/biso region, by use of the Intel ME tools?

@paranoid_android - Good, you have one because you knew you’d need it

Update ME FW can be done two ways, one by following this guide to clean and transfer settings into new stock FW, then reflash via FPT with previously unlocked FD, or by programmer
[Guide] Clean Dumped Intel Engine (CS)ME/(CS)TXE Regions with Data Initialization

If doing FPT method, once done with the guide and you have BIOS with clean updated ME, extract the ME region from the BIOS with UEFITool and name ME.bin
Then use FPT >> FPTw -me -f ME.bin

Or, you can use MW FW Update tool provided in the matching ME System Tools Package for that boards ME

@Lost_N_BIOS , I will see how and when I’ll have oppertunity to test flashing.
In the meantime the Z9 machine has to do some data crunching ;).

I have tested out a bit with prime95 lately, but abortet after 20 Minutes because the fans were running really loud and HWInfo showed that the package temperature was getting close to 70°C.
Also for the memory. The single cores showed much lower temps though.
I wasn’t sure if the 70°C were OK. The CPUs are running with the factory coolers that were included in the Asus ESC-2000 Barebone. I think they’re rather small.
Do you think I could let the machine run for hours in that condition or should I look for more efficient cooling before executing long-run tests?

Another thing that’s interesting is that the machine gets quite laggy when intensively processing data, like creating multiple archives at once or tasks like that.
But the latency mon shows not services which cause lags. Could it be, that the machine currently operates with one dimm per CPU?
So it’s running in single-channel mode and with NUMA configuration. This must the the first time in ten or thirteen years that I let a PC run in single-channel confiuration.
Could that cause so much lag or delay in response? I’ll see if it changes when I add further dimms, but that won’t be until next month or later.

70C is normal for Prime95 if not using water, it will get hotter than that likely, especially with stock coolers. CPU’s will be OK up to 85c or so before I’d start worrying, but it’s not ideal to run that hard/hot for any extended periods.

On the lags, what drives are you using, HDD or SSD? Are the connected to Intel ports, not Marvell? I’m not sure about dimms per CPU, I’ve never used dual CPU setup, but surely if that’s a thing then there is BIOS settings to enable/disable/control this. If you find them missing but see in AMIBCP then we can fix!
Does CPU-z show single channel mode? Do you have channel interleaving enabled, that would help if it’s dual channel. Should be quad channel though right, this is X79 were discussing right?

Prime95 testing is good for stability testing of 24/7 (over)clocks (undervolt) though it has no bearing on real world temperatures unless your running 100% optimised scientific workloads. A Realbench stress test for 4-12 hours would give you a better indication of stability and/or of temperatures, I’d be shooting for 65 c max for longevity.
Sidenote, undervolting 2600 xeons by about 0.010 volt if still stable gives you more mhz since turbo is also controlled by MAX TDP. A lot of workloads also benefit from setting affinity to fewer cores or even disabling cores since the tubo bins for 2690v2 are 6/5/4/3/3/3/3/3/3/3 where 3 is used cores at 3.6 GHz and 4 used cores at 3.5 etc. So having 2 of these babies with only 3-4 cores per CPU set as affinity can be faster for alot of tasks.

The Z9PE-D8-WS has a c602 chipset and has 4 memory channels per cpu so 8 in total from manual:


If only 2 dimms are installed UMA mode might be faster since data doent have to be shared from ram through a singular core then redistributed by that core to the rest in the CPU but is instead delivered to all cores (simmilar to bus mastering).
NUMA should be faster when all dimm slots are populated for crunching heavy tasks, though as a workstation it might feel a bit laggier since alot of user actions will have to go through 2 cores to get to a DIMM.

Yes, my i7 machines have always been operating with quad-channel ram. Since the Z9 has only 4 sockets per cpu I am aquiring large sized dimms over time and therefore I am faced with this situation of running the machine in single-channel config.
I was suspecting that the NUMA has a negative effect when running single-channel ram.
I just find it rather impressive that the slow-down or lag effect of single-channel operation compared to quad is so immense.
This bottleneck problem will be solved within the next months.
But first I will disable the NUMA setting and see how that works out.

The undervolting and core affinity options are very interesting, too. I see that the 2690s go up to 3.3 GHz max if all cores are used.
I have tried some affinity settings for selected tasks but I have to invest more time on that topic and see how the tasks perform.

Meanwhile I’ll download realbench from asus and see how the Z9 system behaves.

@Lost_N_BIOS , its C600/602, so it’s like X79, right. The system will get additional dimms over time.
I use SSDS on the Z9 machine, and they are connected to the intel ports.
The machine also operates processes data from network drives. But the lagging and slow down effects are quite similar, no matter if the SSDs or network drives are used.
I agree to what @JackSted mentioned about the nagtive effect of the NUMA setting in combination with single-channel ram. Using single-channel at all is already worse enough, though.


The stock coolers will also be replaced by more efficient ones.
Next thing is to see how the Z9 will work with the nvme ssd.

I’ve never used dual CPU, so never even heard of NUMA, glad @JackSted was here to advise you. I hate buying memory, so I know how you feel, always costs so much and then price goes down after you buy but rarely drops while you wait to purchase it

That’s true! But this time the memory is rather cheap, at least in comparison.
It could be a bit cheaper per GB if I used 16GB Dimms, but I decided to go to 32GB right away, as I can only supply one slot per channel per CPU.
So in the end the machine will have 256GB Ram on 4 channels for each CPU.
Interestingly enough, DDR3-1866 memory which is non-ECC or unbuffered ECC is often more expensive than ECC today.
Maybe because a lot of companies renew hardware and so used server memory gets re-marketed in large numbers, while gamers and enthusiasts want to boost their old machines for a last time, willingly to pay quite a lot for non-server memory(?).
Quite similar with corporate use motherboards, but especially this Z9-WS board is for sale for almost it’s issue price or even above that (used!). The server boards of that series are available for far less money.
I took the oppertunity to get this complete Asus barbone which was unused, so I got the board not that cheap but at least as good as brandnew, along with a 1300W Power Supply, fans and an optical drive.

About the NUMA mode: NUMA memory access is also used on die designs with large amount of cores, or mutliple-die-on-chip-designs like current AMD Ryzen Threadripper: here, only 2 of the 4 dies have their own memory controllers.
So in the 32-core version, 16cores can benefit from short latencies, while the other 16 always have to wait longer for their memory access. This can cause delays in processing and has to be met by optimizing the process/task scheduler of the OS. I’m no expert, but that is how I understood it.

I’ve never had 16GB dimms, let along 32GB, usually too much $$ anytime I look. How much are those costing these days for 16GB or 32GB, I haven’t looked lately.
Thanks for the info on NUMA, interesting having more than one memory controller on a single CPU. Are you using this system as a server, or for mainly desktop type usage?

These are my first >8GB dimms also. I wanted to make use of the advantage that Xeons can address more memory.
A new 32GB (1866-CL13 ECC, 4Rows) Dimm may cost around 500-600$…
I can get used ones with 1 year warranty for around 140-150€, and private sold ones without warranty for around 110-120€.

The video and image processing software (astro-image stacking) that I use benefits from multiple cores as well as from more memory.
If a raw video can be buffered in ram (has to fit in completely, as it seems), the processing gets much faster.
Some of these videos can exceed 20GB and go up over 30-40GB in size.

As I mentioned , I got aware of NUMA recently only in relation to the Ryzen Threadripper, wich is made up of more than one multicore-die (2 to 4).
Usually we associate one instance of memory controller to one CPU, or a die respectively.
But when you put multiple dies in one package (much alike a multi-CPU system in one socket), you can deal with more than one memory controller per CPU also.
Also larger Xeons do have more than one memory controller on-die, if I recall that correctly.
But for economical, electrical an thermal matters, I guess you have to consider wether you let every die keep its own memory controller or if you strip some.
And of course that results in some dies/cores having to deal with longer latencies when acessing memory.
To be honest, I wasn’t aware that any sort of NUMA setting was available on multi-socket systems like ones made of Ivy Bridge Xeons at all and neither that it can be applied cross-CPU ways.
I can imagine that it can have its benefits if you have all DIMM channels/sockets populated, like @JackSted mentioned.

Intentionally, I wasn’t up to build a system with 256G Ram, but its very unlikely that I’d exchange the dimms this Z9 system in the future; and I can only populate one socket per channel, so I decided to go for the large Dimms.
When the machine is set up to 256GB, I have the oppertunity to eihter make use of a quite large Ram Disc or to try out the memory mirroring mode, wich introduces some sort of raid-like safety level to the memory management.
Available memory reduces to 128GB then, but its mirrored for redundancy use. I don’t think I could make use of this unless I’d be using the machine for real dataserver purposes.

In the end, the Z9-machine will become some sort of serving workstation.
I will transfer most of the data drives there and leave the main machine for gaming and every-day use.
The Z9-machine will be used for archive compression and to do the larger part of raw video processing and image stacking.
I am quite pleased to see that it literally eats up work units which require resampling/resizing of image data; a task that seems to benefit a lot from the cores and the larger memory.
The kind of operations that are more memory-transfer-related though, these are a bit bit faster on the i7-machine which has 8×4GB DDR-1866.
I expect this to change when the Z9 machine gets the other channels populated.
Furhter, it will remain some sort of an experimenting platform. I am planning to practice setting up VMs also.
And further I want to get into the field of 3D modelling again (construction and maybe rendering), it could be possible that I install a (used) corporate use graphics card if I find one.

Next step is to transfer the M.2 to the Z9 and see how it performs, then test out the 2×M.2 controller card.
After that I’ll apply the modified bios.
And in february I’ll try to get an Asus Hyper M.2×16 card.
And so on…

single 32GB Dimm for $500 no way, ever, damn that’s a lot! If I ever purchased that, a year later I’d seen them for $150
I looked up that astro-image stacking, do you use it for what I see in google, HDR images of the night sky, stars etc? If yes, you’ll have to show us some of your photos sometime!

I am not eager to pay $500 or (€) for such a module at all, I’ll take the used/refurbished ones from corporate sellers where I get almost 128G for that price and hope for the best.

[OT]

(Hope Fernando looks the other way) Yep, what you found about stacking points in the right direction. It divides in two main fields: Deep Sky (DS) and Planetary.
Deep Sky is about faint objects like interstellar nebulae while planetary is about, well, planets The planets of our solar system; and the sun and moon additionally.
These two fields differ in aspects of equipment, but have similarities: you operate with a lot of raw uncompressed image data.
Mostly still images in 10, 12, 14, even 16 bpp color (or b/w) range for DS and 8, 10, 12 bpp videos for planetary.

I am currently engaged in the planetary field. If I find something that I think would be worth showing or sharing, yes, I could show it.
If you’d like to, you can take a look at this stack I made of the moon.
I may return to it later and add a little more of color processing to it, but for the first impression on the topic it’ll do, I guess


[/OT]

That sounds better, around maybe $125/32GB at that price, still a lot of $$ but much better! Thanks for sharing your image, sounds very cool how it’s all made.
Damn, that’s super cool and highly detailed! Do you take the images yourself with a telescope?

Thanks a lot, I didn’t want to go into to much detail in this sort of thread, because not anyone looking for information might be interested in reading about astro-stuff here :wink:
The images are generated out of high-dynamic raw videos which have been recorded by a planetary camera that’s mounted to a telescope.
This generates a large amount of data, can sum up to 20 bis 40 GB per Video. So there’s a enough work for lots of cores and enough data for a lot of ram :slight_smile:
Also data traffic is very heavy while recording; the camera can deliver ~2,5Gibt/s thorugh USB3.0 in case there’s a storage drive that responds fast enough.
The M.2 SSD performs better on this than the sata ssds.

About the dimms, I recall that i could purchase 32GB DDR3-1866 CL11 Memory (new) for my i7-machine (Non-ECC) for about €130 seven years ago.
Prices have changed, and maybe because DDR3 is becoming rare the prices for that sort of desktop memory go up.
These large ECC Dimms are relatively cheap to get now, at least used ones.
I don’t recall if I wrote it before, but the Z9 machine with the two 10-core xeons can process the astro-videos rather quick now. Thats also because the data can be buffered completely in ram.
The 32GB of my i7 machine are often not enough and so the software works from disk which slows the process down.
Interesting on the Z9 machine is, that the 20 cores can work very fast and most of the processing all benefits from the large ram, but the single-channel mode causes the machine to lag during the stacking or other heavy work.

That’s a lot of data fast, I can imagine how faster SSD and more memory would always help speed that along better! I bet it would take my system days or weeks to process one of those videos (Dual core, 8GB memory, single regular SSD )

@JackSted and @Lost_N_BIOS - revisiting back to the origial topic that started this post way back in the past:

I installed the IoCrest/NanoTech Twin M.2-Card which uses ASMedia 2824 PCIe Switch Chip onto the Z9PE-D8 WS Board.
The System falls into freeze also, just like the previously tested X79-E WS/Win7 System.

I have Win10 running on the Z9, so I suspect a general mismatch between the ASMedia Chip and the combination of X79/C602 Chipsets and/or IvyBridge-E/EP-CPUs.
Starting the performance test from Samsung Magician is enough to let the system fall into freeze immediately or better to say, in less than a second after initiating the test.
This behaviour is similar to that on the X79-Win10 Systems.
Interesting that I was able to write a small packet of data to the M.2 without the freeze to occur. A few Bytes or KB of text file could be written, but with larger amounts of data, the error occurs.

This was the reason for us to start thinking about finding/unlocking bitfurcation settings in the Bords’ Bioses.
So the next step for me is to try out a real bitfurcation card with the Z9 system.
The Z9 has the bitfrucation settings unlocked by default, so I suspect less trouble when running a card that relies on bitfurcation.
To my surprise, the Asus Hyper M.2 ×16 cards have become rare and expensive on the marke, soI will try out the rather similar product by AsRock - the Ultra Quad Card.


It has an additional feature however and that’s an PCIe-6Pin Power connector. Could be useful when loading the card with four M.2’s.

The Z9 does not recognize, as suspected, the NMVE SSD by default. So after I got the Ultra Quad installed and the SSD running again, I’ll finally try out the NMVE modded Bios by @JackSted.
The Z9 machine was processing all day for the last week, so I had not found time yet to replace the bios.
Next thing missing is, well, a second M.2 SSD to see if all that really works

The Asus Hyper M.2 ×16 card is 80 euro atm. the Asrock Ultra Quad M.2 Card 74,00 same site. I got it from a store that went belly up last week for 50 must have been cus they wanted to lose stock asap.

Unless you’re installing to NVME it should be recognised in windows as a drive nonetheless even without bios mod/driver.

From my guide:
If BSOD’s occur now and then when writing to PCIE setting BIOS > Advanced > System Agent Configuration > IOH Configuration > Gen3 Equalization Phase 2/3 WA + Equalization Phase 2/3 Supported + Gen3 Equalization Redoing WA.
Set all to enabled.
Changing this and also adding 0.010/0.015 vcore and changing VCCSA voltage to the same stock 1.05 VTT voltage seems to have fixed my instabillity(BSOD) problems.
Voltage part might not be neccesary for you because I also have a decent overclock going so the SystemAgent(IMC/PCIE controller) might not be stressed as much.
The extra power on the Asrock one should not be needed since the slot is good for at least 30W and 1 NVME drive uses 6W. (Although it might prevent my latter voltage karfuffel :? )

Good luck!

I was able to get the AsRock Ultra Quad for €67, but where I searched, the Asus is around €80 (or above) as you wrote.

The NVME is currently running unter windows, as before, no problem since I swapped it back to the standerd PCIe×4-M.2 Adaptor.
It appears as NVME controller and storage device in device manager, either if I mount it on the adaptor or the Dual (switched) M.2 Card.

When I mount it to the Dual M.2 card, its instances appear in the branch behind a pci bridge (the instance of the Asmedia 2824) in Device Manager or HWinfo. That’s the only difference.
The standard M.2 adaptor does not appear as an instance/device there at all of course. Same should be with the Asus Hyper M.2 or ASrock Ultra Quad?

Some guys selling this Asmedia Card on the net stated that it upgraded their aged Mac Pros to use NVME, even gave them Nvme boot options. Maybe Apple supplied a new bios for their machines?
One hint is that one mentioned Mac Pro did not have PCIe Gen3 at all, but the use of the Dual M.2 card provided full bandwith for the attached Nvme drives.
I dont understand it completely, but it is suggested that the card provides one ×8 downstream and two ×8 upstreams. That is so that the PCIe Gen3 ×4 bandwith of one M.2 can be translated in to Gen2 ×8 signal for older boards/chipsets? Shrug
So I don’t have a pure split/bitfurcation function on the Asmedia card, it can also provide a switch funtion, like the PLX chips on some mainboards, where two ×8 endpoints are rooted to one ×8 Slot(?)
I can only guess that at least the X79/C602 gets irritated by this behaviour, as it is the same on two mainboards and two different CPus (not very much different as ther’re both Ivy Bridge E) and two different Versions of windows (7 and 10).


So when I have my second NVME SSD and it runs on the Ultra Quad, I’ll install the win10 system there or mirror the current installation (could cause Boot-Up BSOD problems though). After I applied the Bios mod of course :slight_smile:

The aforementioned system freeze (even no BSOD) occurs only when using the MNVE SSD on this card with ASmedia 2824, not when using a regular adaptor.
But I can look what happens when I set the Gen3 Equalization Phase 2/3 WA settings you recommended.

Yup looking at traces running directly from the m.2 slots to the pcie x16 connector the ASRock ULTRA QUAD M.2 card is basically other then some electrical stuff for a fan and the additional power a dumb riser card and probably shouldnt show up as a device.
The people using this or a simmilar card in mac pro’s for system boot without NVME BIOS mod are probably using Samsung pro NVME’s wich have an option rom like a raid card or network card on them.
For the Gen3 Equalization Phase and voltage bios settings, I’d go with the “If it aint broke dont fix it.” motto.