[Problem] Dell R720xd iDRAC BIOS Recovery

So you’ve made some solid progress @ldv. I haven’t done the dd approach, frankly I wouldn’t know where to dd it to, I don’t want to mess with my ‘good’ dell more than I need to. However I think I’ll be able to accomplish the same thing. I used: tftp -p -r mmcblk0.bin 192.168.0.243 -l /dev/mmcblk0 to write the idrac block device to my laptop (.243) running the free SolarWinds tftp server.

I also thought I’d point out that I also found the working boot log full of good information.

Notably this:
Dec 31 18:00:11 (none) kernel: Creating 9 MTD partitions on “m25p80”:
Dec 31 18:00:11 (none) kernel: 0x000000000000-0x000000080000 : "u-boot1"
Dec 31 18:00:11 (none) kernel: 0x000000080000-0x000000100000 : "u-boot2"
Dec 31 18:00:11 (none) kernel: 0x000000100000-0x000000110000 : "env1"
Dec 31 18:00:11 (none) kernel: 0x000000110000-0x000000120000 : "env2"
Dec 31 18:00:11 (none) kernel: 0x000000120000-0x000000130000 : "fru"
Dec 31 18:00:11 (none) kernel: 0x000000130000-0x000000140000 : "res1"
Dec 31 18:00:11 (none) kernel: 0x000000140000-0x0000001c0000 : "tracebuf"
Dec 31 18:00:11 (none) kernel: 0x0000001c0000-0x000000340000 : "lcl"
Dec 31 18:00:11 (none) kernel: 0x000000340000-0x000000400000 : "res2"

For those of us using a chip programmer to recover our idrac on a regular basis it’s very nice to know the boundaries of each section of that chip. I took my original dump file, loaded it into hexedit, used the select region command (ctrl-e) to select each region (110000-11ffff for example) and then save that section in it’s own .bin file. Now if I want to write back a ‘good’ uboot, I just need to concatenate the ‘good’ uboot(s) with my original fru and res .bin files (the fru contains the cn and dell ID’s, and the res has the mac addresses, unique data I’d like to keep on the right motherboard).

I haven’t actually burned that back yet (having just pieced that together today) but if I don’t hit some kind of crc value I have to try to recalculate then I’m hoping to be able to take the good uboot’s and put them on other ‘bad’ eproms without wiping out those board’s ID’s (or worse ending up with multiple servers with the same mac address).

Once I can get a good image I’ll load up my bad board and see if I can get your tftp trick to work. I’m pretty excited about that too.

idrac-tftp-copy.PNG


So I think I have everything I want from the good board. I was able to open the .bin image in winimage and it shows all the same partitions I saw in the emmc. It gives me hope I can write that back block by block and have the same image.

I’ll spend tomorrow soldering headers onto the jtag port/console port of the ‘bad’ board then go through the tedious motherboard swap. Once I get the bad one in there I’ll start by doing the tftp boot thing you’ve shown us.

After I wrap my head around that I think I’ll spend some time trying to figure out the emmc jtag and see if I can find a way to write the .bin back to the chip without using idrac. Maybe if the bad area isn’t actually relevant I can get a functional emmc. I’m thinking maybe the dell upgrade just fails and leaves an incomplete image, but if I can write around the bad areas maybe it’ll end up being in one of those areas that’s either redundant, or mostly blank.

Alternately I’m wondering if there isn’t a way to manipulate the settings in the idrac eprom to boot from the sdcard so that it’s persistent between boots. I notice there’s also a section in there for an nfs boot, if I could get it to consistently boot the image from nfs that’d be fine too since I have a couple nas devices that support nfs.

״Alternately I’m wondering if there isn’t a way to manipulate the settings in the idrac eprom to boot from the sdcard so that it’s persistent between boots. ״ @willard
It definitely possible. Actually, that’s what I’m going to try first, before the attempt to replace the EMMC chip.
I think it can be done by setting the environment variables. What I figured out, is that these variables can be not only a “key = value “ but also a “key={script }” type.
So you can try to run following(excuse me if the syntax is wrong, since I bricked my iDRAC again I can’t check it at this moment , so I’m typing it right out of my memory):
1. To set SD as boot device permanently
set bootcmd ‘sd_boot’
saveenv

2. To boot from tftp permanently
set bootcmd ‘tftpboot -f firming.d7; go addr 0x81000000; bootm;’
saveenv

I finally installed my dysfunctional board. I want to write back the idrac7 firmware, because the idrac8 that’s on there’s probably not a great place to start (that was an attempt to load the latest firmware).

This board has always been really hard to write to the spi eeprom. So I’m struggling with that. If I can get the uboot back to a reasonable version then I’ll see if I can get the sdcard written and modify the boot as you describe. I’m not sure if this chip just hugs the board tighter and it’s harder to get a good clamp on all 8 pins, or if it’s somehow the chip is degrading over time (or maybe oxidation on the pins or something).

I’m debating whether I want to try to pull that chip and put a new eprom on, but then I’m introducing another variable to the problem.

Looks like iam not alone with idrac problem. I also have a 720XD with death idrac and/or lifecycle controller. Total same error/state like in the 1st post. I hope your machine Will be fixed and then i Will able to copy, try your method :slight_smile:

Having just pulled a R720 out of the trash I find it has a busted iDRAC7 too. Runs great, sounds like a banshee, takes 8 minutes to boot, and requires user input(F1) to exit POST but it runs. I’m a hardware guy and really want to fix my embedded iDRAC too. Haven’t spent a penny on it yet and don’t intend to, lol. Reached out to some embedded guys on LinkedIN, a few may have been on the Avocent/Emerson/Vertiv team who actually designed it. Hopefully, one will respond here with something amazingly helpful.

Personaly i do a half solution. I place a teradici tera2 Workstation card to the dell 720XD with an old Nvidia Quadro NV295. Now quadro connected to the teradici tera2 PCOIP card and i can press F1 remote like in idrac vnc… i know its a half solution but better than nothing and dont need to listen the machine :slight_smile:

Or, you could buy tiny Arduino board to emulate a keyboard and send the ‘F1’ keystroke once every five minutes. Back in the day they used this method to guess the four digit pin that could be installed on older Mac BIOSs. Example:www.sparkfun.com/tutorials/337

Hi to All. I read the full discuss. And I might be at the wrong topic but my need repair "process" should be the same. I have a Google TSA T4, it is the YELLOW GOOGLE : Dell R720xd
The issue lock to "BOOT password" or stuck the boot process to "boot password protected"(This is not the password to enter in bios). Mean it boot correctly but alt to a password that will allow you to continue to boot to media.
So as long as I cannot bypass that "BOOT password", the server will go nowhere. Google 3rd bios deactivate the password switch on motherboard(no clear password possible). Goggle is not given Password (for protecting customer data: there is no hard drive…).

I just need it to boot normally and bios update to a regular Dell R720 bios.
How can I "SPI Flash" the Bios? Where are located the component (SPI, UART, Bios…) and what tools need?
Any help appreciate!

For precision ON my issue:
I need to “Force flash” a new bios to the motherboard. The motherboard is stuck at “System password” not stuck at “Bios access”. Yea I might have given a half good explanation by saying “BOOT password”. https://www.dell.com/community/Laptops-G…gs/td-p/4599673

The issue, I am unable to remove or reset the password: Google 3rd bios disable the physical “CLEAR Password” switch on the motherboard. I just want the motherboard to boot normally, I don’t even need to access to the bios… By booting normally I will be able to boot to a USB dos to update the bios.
So if no normal “boot” then “Force flash”!!!

Thank you!!

I am that guy on Reddit. Here is the definition of J_DBG_EMMC. I can’t remember clearly whether the VCC and GND at the top-right corner are labled in the right order. So please make sure the GND pin is short to other GND before soldering. I will upload a “Good” emmc image of idrac soon .

I have 4 MOBO of R720 with the same problem. This one in the picture is fcuked up, you guys could see some solder pads were damaged. That is suspicious all 4 mobo died in the same way, and all of them are acting like EMMC damaged and cant be Written into. It shouldn’t be the flaw of the emmc chip. Instead, I guess, it’s the issue of Dell’s design.

We can read this chip on borad with a SDCard reader, however, I am still searching a way to cut off the connection of the SoC to EMMC chip. Very annoying, when trying to write data into the emmc, SoC interferes so heavily. Someone suggested get the quartz oscillator which to the right edge of the EMMC offline. I am some kind busy these days, if it works please let me know.

DELL-EMMC-Debug-Lable.jpg

Thanks to the help here, I was able to recover iDRAC on my Dell r620 and update it to the latest version.


Initial State
I purchased a r620 that had iDRAC version 1.66.65 installed and all update attempts failed with the following error message: RED007: Unable to verify Update Package signature.

After trying to update using multiple methods (tftp, racadm, lifecycle controller) I got to a state where it looked like the update was going to take, however when iDRAC reset, it failed to initialize with the fans running at full throttle and the iDRAC status light off.


Connecting to iDRAC UART
I soldered a 4 pin header to J_IDRAC_UART, then connected it to the Raspberry Pi GPIO on GND, TX, and RX. (Any UART controller should work, I just had a RPi available)
Note: If using a Raspberry Pi’s serial port, you may need to disable the Raspberry Pi uart console

After connecting to the serial port with “screen /dev/serial0 115200” and turning on the system, I noticed multiple errors relating to SQUASHFS

1
2
3
 
Dec 31 18:00:53 (none) kernel: SQUASHFS error: zlib_inflate error, data probably corrupt
Dec 31 18:00:53 (none) kernel: SQUASHFS error: squashfs_read_data failed to read block 0x18337f9
Dec 31 18:00:53 (none) kernel: SQUASHFS error: Unable to read fragment cache entry [18337f9]
 

U-Boot was loading the iDRAC image, however, iDRAC would not start due to corruption.


Interrupting U-Boot
To recover the firmware, I had to force U-Boot to interrupt the boot process. I did this by bridging the second pair of pads in SW_IDRAC_DBG.
Note: You have to be fairly fast with this and press any key when prompted to interrupt the boot process. To assist with this, you can bridge the first pair of pads, which forces iDRAC to reboot, then move the wire to the second pair.

1
2
3
4
5
6
7
8
9
10
 
Hit any key to stop autoboot:  0
WDT2: Disable in abortboot()
OSWDT: Disable in abortboot()
CPLD_BMCRDY: Enable BMC_MIN_RDY in abortboot(). Prevent BIOS reset.
 
NOTE: After stopping u-boot in this development mode. You may need to
warm/cold reset the server when booting iDRAC manually as BIOS
may have already viewed iDRAC as unresponsive.
 
iDRAC7=>
 

After getting to the iDRAC7 console, I attempted to recover the firmware using tftp as @ldv did, but I could not get U-Boot to communicate over my network. Instead, I recovered the firmware using a SD card.


Preparing the SD Card
1. Format the SD card as FAT, then copy firmimg.d7 from the iDRAC update exe to the card. I formatted the card with a 256MB FAT partition, but this may not be necessary.
2. Insert the SD card into the server.
3. Check if uboot can read the SD card

1
2
3
4
5
 
iDRAC7=> mmc list
sh_mmcif: 0
sh-sdhi: 1
iDRAC7=> mmc rescan 1
iDRAC7=>
 

4. If you receive the error 'Card did not respond to voltage select!', try rebooting iDRAC with the SD Card already inserted.


Recovering the Firmware
I was now able to recover the firmware from the SD card using the following command:

1
 
iDRAC7=> util recover -emmc -from sd -f firmimg.d7
 

After the partitions are updated, iDRAC should reboot. iDRAC hung for me at 'Monolithic/DRB' and I had to remove the SD Card and hold the iDRAC reset button to get past this point. iDRAC was then working correctly and I could boot the system without issues.


Other notes:
-I recovered the firmware to 2.10.10.10, then to the latest iDRAC version using this method. You may be able to jump straight to the latest version instead.
-After updating iDRAC, I was still unable to update it using the traditional methods. Traditional updates continued to fail with RED007: Unable to verify Update Package signature.

iDRAC.jpg

iDRAC recovery.txt (12.8 KB)



I have the same problem with my R320. I was so excited when I read about your progress here and wanted to try and load an image via tftp. Unfortunately, I immediately tried out the permanent approach using

1
 
set bootcmd ‘tftpboot -f firming.d7; go addr 0x81000000; bootm;’
 
. Anyways, I now cannot seem to interrupt U-Boot anymore via the SW2 pins. It's now stuck downloading the image via tftp, then doing something called "## Starting application at 0x000..." (not 0x81000000) and immediately restarting U-Boot, seeming like it panics or something. It's now doing this forever. How can I restore the environment or switch to the backup env? I don't have an SD card reader.
I tried tricking it by providing an invalid file or something, but it keeps retrying until forever...

EDIT: I managed to escape the boot loop by finding the relevant clue inside the U-Boot documentation: if bootdelay=0, then you're going to have a hard time interrupting u-boot. Fortunately, you can glitch yourself into u-boot anyway using CTRL-C. Just be sure to hold it down when rebooting and with a bit of luck you either magically enter u-boot or you corrupt the memory of uboot1, so it will override it with the content of uboot2.

Unfortunately, tftp firmimg.d7; go 0x81000000 doesn't work either. It will just infinitely boot loop without actually booting.

idrac_fail.txt (6.11 KB)

I’m facing a similar issue: iDRAC won’t boot completely, stuck in this:

syslogd: /mmc1/idraclogs: Read-only file system
Dec 31 18:21:28 idrac8 syslogd 1.4.2: restart.
syslogd: /mmc1/idraclogs: Read-only file system
Dec 31 18:27:09 idrac8 syslogd 1.4.2: restart.
syslogd: /mmc1/idraclogs: Read-only file system
Dec 31 18:29:04 idrac8 syslogd 1.4.2: restart.

I connected the serial console successfully, but cannot break out of U-Boot boot process. Tried shorting second pair of pins, tried CTRL+C, without any luck. Actually spamming CTRL+C results in the boot loop ocassionally, but no U-boot prompt… Anyone has any other ideas how to solve this?



Isn’t that error due to dell replacing the signing certificate in 2.21.21.21 (i think it was) I had that error updating until I checked through the releases and updated to the previous important update then I could update,

Thanks to everyone who persisted with this and managed to get to a point where the iDrac can sometimes be recovered. I have read all of the replies on this forum and was really encouraged when @doomguy successfully recovered his iDrac. I followed his instructions to the letter on an R320 but I couldn’t get the firmware to write to the emmc. I had exactly the same errors as others:

iDRAC7=> util recover -emmc -from sd -f firmimg.d7

UTIL RECOVER:Transport:sd TargetMMC:EMMC File:firmimg.d7
Loading file “firmimg.d7” from mmc device 1:1 (xxb1)
62953181 bytes read
UTIL RECOVER:SD load passed from EXT2 fs.
UTIL RECOVER:Transport time [sec:mil]: 01:08:542

Clear OS images in partition/s.
Clear kernelN, rootfsN, ubootN
’EMMC’ blk size=[0x200][512] Erase to 0xffffffff
Mem buf size=[0x06000000][100663296] Total bytes=[0x08000000][134217728]
blocks[0x1:0x40001][1:262145]
mmc write failed000] Buf[0x88000000]…fill buffer…erase/write
util flash:
Failure writing blocks [1:196609] RC=0xffffffac
UTIL PASS

I tried versions 1.4, 1.5, 1.6, 2.1 and 2.6 and they all failed with the same error. Did anyone manage to solve this last problem?

I’ll keep trying as I have nothing to lose.

I have now come to the conclusion as many others have that the eMMC gets corrupt when the firmware update fails. How or why I don’t know. The only way to prove this would be to replace it but I don’t have the skills or equipment to do this.

What I was able to find, which may help others is the uboot recover method used by @doomguy does work in some cases as I tried this on a fully working iDRAC without issue.

duplicate post, ignore



There is a good chance you’ll be able to recover this as you’re getting past the uboot loader which is where mine fails. To get a recovery prompt I find it’s best to short pin 1 to ground and then immediately short pin 2 to ground and hold it there until you see the "press key to interrupt boot". Once you get there you should be able to re-flash your iDRAC using the method above (util recover -emmc -from sd)



For what it worth…
Updating the 12th gen fw using tftp, racadm, lifecycle controller and Centos bootable methods seems to fail quite often.
I have used the Windows SUU (SERVER UPDATE UTILITY now called DELL EMC OPENMANAGE SERVER UPDATE UTILITY ) method since 8th Generation. Primarily because this is what was taught in the Dell training school I attended many years ago.
Seems like the new cool methods like iDrac and LCC updates get turned off at some point and you are left in the Dark.
The important thing that was stressed with the 9th gen was keeping the Bios and BMC versions in sync.
It also makes sense that going forward as the BMC morphed into iDrac/LCC over generations that keeping the Bios/iDrac/LCC version in sync.
The Windows SUU updates the iDrac/LCC first and at boot then updates the Bios last, hence they remain in sync.
The newer 12th gen appears to use the LCC to do the Bios update hence the LCC MUST be updated first.
The SUU has a file called ‘DellSoftwareBundleReport xx.xx.xx.html’ that lists the firmware by machine for that release. One could use this list to determine what versions of the various fw components are comparable.
You can find the SUU in the ‘Drivers and Downloads’ section under ‘Systems Management’

I have used this method on R210, R210ii, R310, R410, T410, R510, R610, R710, T320, R420, T420, T620, R620

Install Windows OS eval on a HD - Read the ‘Release Notes’ for the Supported OS.
For example SUU 19.07 will run on Server 2012 and newer
Install cd/dvd virt driver - such as MagicDisk
Mount SUU ISO for your generation, Run Setup on the mounted SUU ISO
WAIT until the inventory comes up - May take 10 minutes.
install updates.
The updates will install in the correct order.
Wait for prompt to reboot.

Hope this helps…