[Problem] Dell R720xd iDRAC BIOS Recovery

@willard Hi. Seems I missed something here, what have you done to get this “the structure wasn’t actually written to the emmc, but was written to the sdcard”.
Where did all this partitions on the SD card come from? I got the similar layout on my SD card after taking the MMC dump by using dd.
Did you try to boot from this SD card by changing a “root” value of the “bootargs” variable in UBBoot?


That’s exactly what I mean :slight_smile:


Finally, I was able to stop booting. But I had to give a resistor on I_emmc_debug then idrac starts.

I have motherboard R320.

There is an unprotect all command in the idrac7 console but that doesn’t seem to help. As for the discrepancy in the RC value you pretty much hit the nail on the head. I don’t have the internal documentation for any of this and am just gathering information and results hoping for an epiphany over time.

I’ve been inactive on this for the last week in part because there are two distinct motherboards I have failures on (both failed with the same updates but in different methods of delivery). In any case the one I’m working on now I’ve gleaned from other articles that I may be able to update the surrounding firmware packages and because I can get into the bios firmware recovery screen I’m exhausting that avenue before going farther.

I also just received another working r420 yesterday. I’m going to use that to observe a working system without updates, and then swap the motherboard out with the new one and work on recovering the one that won’t even let me into bios.


Finally, I have my jatagulator but I’m waiting for my buspirate before trying to get into the emmc debug port since I haven’t found any good information on that.

In looking at the link I should clarify that I can rewrite the uspi bios all day long with minimal (clip related) issues. It helped immeasurably by ordering a good wire wrap 3M clip, rather than using the cheap black clip that came with my programmer. I had to press that hard straight down to get it to bite, where this new 3m clip requires little or no jiggling to get it in the right contacts to read and write the bios.

I’m convinced it’s the EMMC image that’s out of sync with the uboot in the bios chip. I do boot up and see the settings (stored in the bios chip) are specifically for version 1.66.65 bios and I’m reasonably sure the issue originally happened trying to flash a newer LC integrated bios to the on board flash chip. And the final question I didn’t answer yet, now I didn’t use a ASProgrammer but the ch341a gold programmer seems to be working well enough for the uboot images (of which I have about 6 now).

@filet187 I may not have clarified this in my posts but I was buying those super cheap 2-3 dollar ch341 usb ttl adapters from aliexpress that have the 3v settings, but yeah if you’re using a 5v rs232 connection you’d want a resister to avoid frying anything. Before I connected anything I checked the voltage and the 3v seemed like the right setting to use (I have a few versions they all have a way to jumper the board side to 3 or 5 v).

In my case on the r420 the dbg port was 3v. Besides I’d rather fry a cheap rs232 board than fry the motherboard so if I were in doubt I’d try the 3v adapter before assuming it was 5v, but good on you for using the resistor.

@ldv so to be honest I need to be better and putting my test notes inside my pastebin to make that more obvious. Going off my memory (a dicey proposition) I thought I used the idrac7 console to manually write the emmc. Somehow I must have targeted the wrong device, because all I did to get it to dump those partitions from the pastebin was restart the server after the wipe locked up drac.

I have limited time at work to work on this, and I can’t sneak in the tests because those darn idrac fans sound like a jet engine. Now I have my new server at home I’ll set up a bench around these tests and recreate the experiments with better pastebin documentation.

I’ll probably spend the first day or two gather observations from the working board simply because it’s a pain to pull the cpu to swap the motherboard.


I’ll also want to leave the original board working as a target for the buspirate stuff because if I can get it to dump the emmc then I’d want to write both that and the idrac flash to the broken board and see if it works that way.

So I have the same issue as you do @willard with a r720.

Failure writing blocks [1:196609] RC=0xffffffac

Was trying to figure out how @ldv make a dd copy of a good working emmc, since there seems to be no dd type commands in the idrac uart shell. Is there another header for emmc type uart shell?

I have a working r720 and another identical r720 with the idrac problems. wondering if i can dd the working emmc onto an sd and try to boot from the sd on the non working server, as was suggested being possible.

@frameshift18 1. I have a second server with working iDRAC so I ran dd on it in a UART shell.
2. My appologise to @filet187 , finally I could not be able to a dd dump from the R730 server. Really sorry for that.
3. Look what I found recently:

1. iDRAC’s NIC is not and can’t be turned on by the u-boot, but u-boot can use server’s NIC.
In my setup, all 4 ports were in team. To check this theory I excluded first port from the team and set it disabled in OS settings.

Annotation.png



2. On the next step, I ran following(please note that I didn’t specify the TFTP and HOST IP addresses. This, because once I ran this command with this parameters specified, they are stored in SPI automatically. Also, you can use setenv to set this
parameters before running tftpboot):

iDRAC7=> tftpboot firmimg.d7
Using sh_eth.0 device
TFTP from server 10.0.1.111; our IP address is 10.0.1.199
Filename ‘firmimg.d7’.
Load address: 0x81000000
Loading: 53 MB
Bytes transferred = 56019170 (356c8e2 hex)

2. You can see that firmware was loaded into the RAM at address 0x81000000. Ok, let’s see if fwu can check it:

iDRAC7=> fwu check 0x81000000
Checking image header CRC … OK
Checking platform env ID… OK
Checking kernel image CRC … OK
Checking rootfs image CRC … OK
Checking u-boot image CRC … OK
Skipping u-boot update … NO
Checking Platform image CRC . OK
Done!

WOW! I’m so excited!

3. Some more from fwu help:
fwu mmc [‘sd|emmc’]
- list current mmc device, or set the target mmc device
Ok, lets try it with our SD card, so first I choose it with following command

fwu mmc sd

and run…


iDRAC7=> fwu update 0x81000000

*** Updating Partition 1
Checking image header CRC … OK
Checking platform env ID… OK
Checking kernel image CRC … OK
Checking rootfs image CRC … OK
Checking u-boot image CRC … OK
Skipping u-boot update … NO
Checking Platform image CRC . OK
Copying kernel image … OK
Copying rootfs … OK
Copying u-boot1 to flash… OK
Copying u-boot2 to flash… OK
Copying u-boot to MMC… OK
Copying platform image … OK
Done!

WOW! looks promising…

4. Let’s see if it works on eMMC.
Switch it back
fwu mmc emmc

and run…
iDRAC7=> fwu update 0x81000000

Updating Partition 1
Checking image header CRC … OK
Checking platform env ID… OK
Checking kernel image CRC … OK
Checking rootfs image CRC … OK
Checking u-boot image CRC … OK
Skipping u-boot update … NO
Checking Platform image CRC . OK
Copying kernel image … mmc write failed
fail to write block to device
Fail
fail to update kernel image
fail to update image ***
iDRAC7=>

So now I guess I have only three options:
1. This is an eMMC failure and it should be replaced.
2. Maybe one of the “SW_IDRAC_DBG” jumpers is used to remove write protection from eMMC(like a switch on a side of the SD card)?
3. Or maybe anybody knows how to tell u-boot to load kernel from the SD card?

Sometimes it’s hard to separate what’s conjecture from what’s empirically tested, and also some people are leveraging router experience and adapting it to the lighthouse circuits on the dell motherboard. I’ve done a bit of both myself, but so far since the only thing that consistently works on mine is the uboot and not the emmc (which I’m pretty sure I’ve tragically overwritten at this point).

I also found reading other forums and blogs, that some people reference the serial debug cable, and some are actually using either Windows or Linux actually loaded on the server with idrac tools, so it could be one of those who were using the dd command. Or if you can get the emmc image to load, that’d have a functional enough version of embedded Linux you could probably use dd. In my case all that’s corrupted or erased, so I just have the uboot which really is just a boot strapper from what I can see.

My best bet right now is to use the other port marked emmc debug, but I poked around on it with an oscilloscope and it had one pin that had a crazy looking stepped sine. I’m wondering if that isn’t the jtag io which from what I can discern is some crazy high speed bit stuff.

I’m not about to figure out on my own, and for whatever reason the domestic source for the jtag didn’t have the version board I wanted, so I ended up ordering it from overseas (along with several cable assemblies) so I’m still waiting for half that hardware to show up.

I do still plan to see what options I have with a known working emmc once I get some time to focus on one thing for a few hours.

One of the first things I’m going to do in the home lab is consolidate my notes so far into a pastebin journal, then going forward each test I do I’ll document both the intent of the experiment, along with logs and observations. That way later I can hopefully gather it all together and if I ever find something that works I can go back and retrace my steps to recreate it.

My ultimate goal would be to then reflash the wrong bios just to re-corrupt it, and then repeat whatever ‘fix’ I come up with to recover it with the eventual goal of streamlining it down to a non invasive process that doesn’t require me to solder jumpers onto debug ports.

1. tftpboot firmimg.d7
2. go 0x81000000
3. bootm

(0x81000000 is the address where tftpboot load the fw)

Enjoy!
But now I 100% convinced that EMMC is bad, and this is the root of the problem




[SH7757 ~]$ dd if=/dev/mmcblk1 of=/dev/mmcblk0
sh_mmcif: Cmd(d’18) err
mmcblk0: retrying using single block read
Mar 20 11:34:20sh_mmcif: Cmd(d’17) err
(none) kernel: mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
sh_mmcif: Cmd(d’end_request: I/O error, dev mmcblk0, sector 32
18) err
Mar 20 11:34:20 (none) kernel:sh_mmcif: Cmd(d’17) err
mmcblk0: retryimmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
ng using single end_request: I/O error, dev mmcblk0, sector 33
block read
Mar 20 11:34:20 (none) kernsh_mmcif: Cmd(d’17) err
el: sh_mmcif: Cmmmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
d(d’17) err
Maend_request: I/O error, dev mmcblk0, sector 34
r 20 11:34:20 (none) kernel: mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
Mar 20 11:34:20 (none) kernel: end_request: I/O error, dev mmcblk0, sector 32
Mar 20 11:34:20 (none) kernel: sh_sh_mmcif: Cmd(d’17) err
mmcif: Cmd(d’17) err
Mar 20 11:34:20 (mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
none) kernel: mmend_request: I/O error, dev mmcblk0, sector 35
cblk0: error -84 sending read/write command, response 0x0, card status 0xd40
Mar 20 11:34:20 (none) kernel: end_request: I/O error, dev mmcblk0, sector 33
Mar 20 11:34:20 (none) kesh_mmcif: Cmd(d’17) err
rnel: sh_mmcif: Cmd(d’17) err
Mar 20 1mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
1:34:20 (none) kend_request: I/O error, dev mmcblk0, sector 36
ernel: mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
Mar 20 11:34:20 (none) kernel: end_request: I/O error, dev mmcblk0, sector 34
Mar 20 11:34:20 (none) kernel: sh_mmcif: Cmd(d’17) err
Mar 20 11:34:20 (none) kernel: mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
Mar 20 11:34:20 (none) kernel: end_request: I/O error, dev mmcblk0, sector 35
Mar 20 11:34:20 (none) kernel: sh_mmcif: Cmd(d’17) err
sh_mmcif: Cmd(d’mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
17) err
Mar 20end_request: I/O error, dev mmcblk0, sector 37
11:34:20 (none) kernel: mmcblk0sh_mmcif: Cmd(d’17) err
mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40

Mar 20 11:34:2end_request: I/O error, dev mmcblk0, sector 38
0 (none) kernel: end_request: I/O error, dev mmcblk0, sector 36
Mar 20 11:34:20 (none) kernel: sh_mmcif: Cmd(d’17) err
Mar 20 11:34:20 (none) kernel: mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
Mar 20 11:34:20 (none) kernel: end_request: I/O error, dev mmcblk0, sector 37
Mar 20 11:34:20 (none) kernelsh_mmcif: Cmd(d’17) err
: sh_mmcif: Cmd(mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
d’17) err
Mar end_request: I/O error, dev mmcblk0, sector 39
20 11:34:20 (none) kernel: mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
Mar 20 11:34:20 (none) kernel: end_request: I/O error, dev mmcblk0, sector 38
Mar 20 11:34:20 (none) kernel: sh_mmcif: Cmd(d’17) err
Mar 20 11:34:20 (none) kernel: mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
Mar 20 11:34:20 (none) kernel: end_request: I/O error, dev mmcblk0, sector 39
dd: writing ‘/dev/mmcblk0’: Input/output error
33+0 records in
32+0 records out
16384 bytes (16.0KB) copied, 0.291646 seconds, 54.9KB/s
[SH7757 ~]$

IMG_2105.jpeg

So to elaborate on my previous post, I hooked up the com port on my new dell server off ebay (pastebin XqFcZ1GR). It does boot up to a small linux distro. I haven’t done a lot yet, now I want to get the network online so I can use tftp/ftp and otherwise try to mess with that server without (hopefully) corrupting it too.

What you and I have been struggling with is we’re already in a broken uboot environment, and we don’t get all that extra emmc functionality. With this new test bed I have that same shell the others with a ‘partially broken’ idrac have.

Still this is encouraging to me, and there’s a whole new are for me to explore now. I wish stuff didn’t take so long to ship from china.


Wait, you got idrac to boot without emmc??? Am I reading this right?

Yes you are! )) But because the OS is loaded from RAM it works only until the next reboot. But be careful! Yesterday I bricked it again. What I did, was trying to upgrade FW. Don’t ask me why I decided to takeaway 32Gb SD card and put a 1Gb SD card instead. SD card size should be the same size as the EMMC or more. So when I ran fwu sd and then fwu update it somehow erased the SPI flash and now there’s no u-boot anymore. My guess is that when you run fwu update, it creates partition layout on SD card and then copies U-boot.bin to it and then to SPI. May be I’m wrong, but otherwise I don’t understand why there are u-boot partitions on SD/EMMC If you run util flashinfo you can see all this partitions and their content.
So I need to reprogram the SPI memory using external chip programmer. At least this time I now how to bring the iDRAC to work) And this time I’m going to try to replace the eMMC chip too.

Thats amazing!!! If I understand correctly, since the idrac is its own CPU/OS, once we get it loaded it will stay in the background as long as the system is plugged in? So that means no more crazy fan noise, no more 15-minute-long boot/reboot times! We can even shut down the main system (Windows), and idrac will keep working as long as the power cable is plugged in. If this is indeed the case then this is HUGE! It makes the system useable again!

So you’ve made some solid progress @ldv. I haven’t done the dd approach, frankly I wouldn’t know where to dd it to, I don’t want to mess with my ‘good’ dell more than I need to. However I think I’ll be able to accomplish the same thing. I used: tftp -p -r mmcblk0.bin 192.168.0.243 -l /dev/mmcblk0 to write the idrac block device to my laptop (.243) running the free SolarWinds tftp server.

I also thought I’d point out that I also found the working boot log full of good information.

Notably this:
Dec 31 18:00:11 (none) kernel: Creating 9 MTD partitions on “m25p80”:
Dec 31 18:00:11 (none) kernel: 0x000000000000-0x000000080000 : "u-boot1"
Dec 31 18:00:11 (none) kernel: 0x000000080000-0x000000100000 : "u-boot2"
Dec 31 18:00:11 (none) kernel: 0x000000100000-0x000000110000 : "env1"
Dec 31 18:00:11 (none) kernel: 0x000000110000-0x000000120000 : "env2"
Dec 31 18:00:11 (none) kernel: 0x000000120000-0x000000130000 : "fru"
Dec 31 18:00:11 (none) kernel: 0x000000130000-0x000000140000 : "res1"
Dec 31 18:00:11 (none) kernel: 0x000000140000-0x0000001c0000 : "tracebuf"
Dec 31 18:00:11 (none) kernel: 0x0000001c0000-0x000000340000 : "lcl"
Dec 31 18:00:11 (none) kernel: 0x000000340000-0x000000400000 : "res2"

For those of us using a chip programmer to recover our idrac on a regular basis it’s very nice to know the boundaries of each section of that chip. I took my original dump file, loaded it into hexedit, used the select region command (ctrl-e) to select each region (110000-11ffff for example) and then save that section in it’s own .bin file. Now if I want to write back a ‘good’ uboot, I just need to concatenate the ‘good’ uboot(s) with my original fru and res .bin files (the fru contains the cn and dell ID’s, and the res has the mac addresses, unique data I’d like to keep on the right motherboard).

I haven’t actually burned that back yet (having just pieced that together today) but if I don’t hit some kind of crc value I have to try to recalculate then I’m hoping to be able to take the good uboot’s and put them on other ‘bad’ eproms without wiping out those board’s ID’s (or worse ending up with multiple servers with the same mac address).

Once I can get a good image I’ll load up my bad board and see if I can get your tftp trick to work. I’m pretty excited about that too.

idrac-tftp-copy.PNG


So I think I have everything I want from the good board. I was able to open the .bin image in winimage and it shows all the same partitions I saw in the emmc. It gives me hope I can write that back block by block and have the same image.

I’ll spend tomorrow soldering headers onto the jtag port/console port of the ‘bad’ board then go through the tedious motherboard swap. Once I get the bad one in there I’ll start by doing the tftp boot thing you’ve shown us.

After I wrap my head around that I think I’ll spend some time trying to figure out the emmc jtag and see if I can find a way to write the .bin back to the chip without using idrac. Maybe if the bad area isn’t actually relevant I can get a functional emmc. I’m thinking maybe the dell upgrade just fails and leaves an incomplete image, but if I can write around the bad areas maybe it’ll end up being in one of those areas that’s either redundant, or mostly blank.

Alternately I’m wondering if there isn’t a way to manipulate the settings in the idrac eprom to boot from the sdcard so that it’s persistent between boots. I notice there’s also a section in there for an nfs boot, if I could get it to consistently boot the image from nfs that’d be fine too since I have a couple nas devices that support nfs.

״Alternately I’m wondering if there isn’t a way to manipulate the settings in the idrac eprom to boot from the sdcard so that it’s persistent between boots. ״ @willard
It definitely possible. Actually, that’s what I’m going to try first, before the attempt to replace the EMMC chip.
I think it can be done by setting the environment variables. What I figured out, is that these variables can be not only a “key = value “ but also a “key={script }” type.
So you can try to run following(excuse me if the syntax is wrong, since I bricked my iDRAC again I can’t check it at this moment , so I’m typing it right out of my memory):
1. To set SD as boot device permanently
set bootcmd ‘sd_boot’
saveenv

2. To boot from tftp permanently
set bootcmd ‘tftpboot -f firming.d7; go addr 0x81000000; bootm;’
saveenv

I finally installed my dysfunctional board. I want to write back the idrac7 firmware, because the idrac8 that’s on there’s probably not a great place to start (that was an attempt to load the latest firmware).

This board has always been really hard to write to the spi eeprom. So I’m struggling with that. If I can get the uboot back to a reasonable version then I’ll see if I can get the sdcard written and modify the boot as you describe. I’m not sure if this chip just hugs the board tighter and it’s harder to get a good clamp on all 8 pins, or if it’s somehow the chip is degrading over time (or maybe oxidation on the pins or something).

I’m debating whether I want to try to pull that chip and put a new eprom on, but then I’m introducing another variable to the problem.

Looks like iam not alone with idrac problem. I also have a 720XD with death idrac and/or lifecycle controller. Total same error/state like in the 1st post. I hope your machine Will be fixed and then i Will able to copy, try your method :slight_smile:

Having just pulled a R720 out of the trash I find it has a busted iDRAC7 too. Runs great, sounds like a banshee, takes 8 minutes to boot, and requires user input(F1) to exit POST but it runs. I’m a hardware guy and really want to fix my embedded iDRAC too. Haven’t spent a penny on it yet and don’t intend to, lol. Reached out to some embedded guys on LinkedIN, a few may have been on the Avocent/Emerson/Vertiv team who actually designed it. Hopefully, one will respond here with something amazingly helpful.

Personaly i do a half solution. I place a teradici tera2 Workstation card to the dell 720XD with an old Nvidia Quadro NV295. Now quadro connected to the teradici tera2 PCOIP card and i can press F1 remote like in idrac vnc… i know its a half solution but better than nothing and dont need to listen the machine :slight_smile: