[Problem] Dell R720xd iDRAC BIOS Recovery

So I went down the SW2 pins with my volt meter (just to have something to contribute) and I noticed that only sw2.1 sw2.2 and sw2.3 have 3.3v on one leg.

I just did that as soon as I plugged in the power, but before I turned it on (since the idrac is already powered at that point).

If you can find a similar situation with your shown switch header then I’d start by holding a jumper on switch to position 2 first (assuming 1,2,3 are the same that have 3.3 volts on one side).

If that didn’t work I’d try the other two. If it’s just a completely different set of switch positions, then maybe that will narrow it down. It’s sort of a pain because I bent a stiff wire in a U shape and held it in place until the boot was interrupted, then I pulled the U jumper off by hand (so it was pressure holding it in place)…

It works well enough until you find the right pair, now I’ll probably solder two headers on there that I can short more easily with a jumper instead. Or just a toggle switch. Whatever takes the least effort at the time.

Well this is weird, my iDRAC SD card wasn’t readable, I popped it back in my laptop and it looks like the structure wasn’t actually written to the emmc, but was written to the sdcard.

Now I’m not sure to be happy or sad. If I can copy a bootable image into that sdcard structure I wonder if it would boot. I don’t quite understand though why it would try to recover itself to the sdcard, maybe that is a failsafe when the emmc is bad.

Maybe I misunderstood the yellow flashing light mode, and it seems important I don’t put the sdcard in until after it’s in it’s polling state.

idrac-recovery-sdcard.PNG



Or now I see that again, maybe I can read the sdcard into a buffer area and try to write that back to the emmc manually. Or actually, if I can get that tftp process worked out I should be able to send the .d7 file over the network and maybe it’ll write to the sdcard…

@willard - I found this, a little different than your post at 51, https://pastebin.com/pDnCkn0A
[0:196608] RC=0xffffffac << This caught my eye, it’s a 1:196609 on the link above. What does the RC=0xffffffac mean to you? Read Count, total file size = 0xffffffac (5MB)? And the failed write is only at 0-19660x?

Have you connected this chip to ASProgrammer 1.40? In that, if any area of the chip is write protected or locked etc, you can unlock it, I describe/discuss this here
Bricked Asus Z170-AR (2)

@willard Can you show which pins are paired to stop uboot boot?
I also have a broken Idrac :frowning:

My boot idrac

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 

U-Boot 2009.08-00066-g951a018 (Apr 30 2014 - 13:44:13) Avocent (0.0.3) EVB, Build: jenkins-idrac-yocto-release-483
 
CPU: SH-4A
BOARD: R0P7757LC00xxRL (C0 step) board
BOOT: Secure, HRK generated
DRAM: 240MB
(240MB of 256MB total DRAM is available on U-Boot)
ENV: Using primary env area.
In: serial
Out: serial
Err: serial
WDT2: Booted Lower Vector, 'uboot1'
sh_mmcif: 0, sh-sdhi: 1
Net: sh_eth.0, sh_g_eth.0
INFO: 00:003 Start-up -to- util_idrac_main()
INFO: 00:004 U-Boot 2009.08-00066-g951a018 (Apr 30 2014 - 13:44:13) Avocent (0.0.3) EVB
INFO: 00:008 U-Boot checkin date(05-10-2013) Version(1.0.183)
INFO: 00:006 iDRAC PPID <NULL>
INFO: 00:003 SPI NOR init 4096 KiB MX25L3206E bus=0 cs=0, speed=1000000, mode=3
INFO: 00:007 SH-4A Product: Major Ver=0x31 Minor Ver=0x14 C4 Little endian
Family=0x10 Major Ver=0x30 Minor Ver=0x0b
PASS: 00:016 Dedicated monolithic mgmt NIC disabled
INFO: 00:129 BCM54610 OUI=0x3fffff Model=0x3f Revision=0x0f PhyAddr=1
INFO: 00:030 SD CARD: Device: sh-sdhi Manufacturer ID: 1b OEM: 534d
Name: 00000 Tran Speed: 25000000 Rd Block Len: 512
SD version 2.0 High Capacity: No Capacity: 2021654528
timeout
mmc read failed
** Can't read Driver Desriptor Block **
 
 

My nand seems to be damaged …

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
 

iDRAC8=> mmcinfo
timeout
mmc read failed, err=-84
** Can't read Driver Desriptor Block **
timeout
mmc read failed, err=-84
Device: sh_mmcif
Manufacturer ID: 6
OEM: 0
Name: 16&#9618;&#9618;&#9618;
Tran Speed: 20000000
Rd Block Len: 512
MMC version 3.0
High Capacity: No
Capacity: 1073741824
Bus Width: 1-bit
 

 


Can someone throw a nand dump to try to run from the sd card?

@filet187 Which server model do you have? It varies from model to model, but in general you should find the area outlined with the rectangle and named as something that should probably include the “SW” in its name.
On my server (Dell PowerEdge R620) it was “SW_IDRAC_DBG”. This area contains a group of contact pads that seems to be designed for a DIP Switch placement. Suppose from the one side all pads are marked as A1,A2,A3… and on the second side as B1,B2,B3…, so you can start jumpering B1-A1, B2-A2, e.t.c. Eventually you should find the right pair. In my case A1-B1 makes a hard reset to the iDRAC, which is extremelly usfull if you have to make all your experements on a working machine without power cycle it. Jumpering the A2-B2 pair interrupts iDRAC boot process.
I succeed to figure it out after reading the Willard’s explanation in previous posts. So you can thank him :wink:

@filet187 I have a R730 server, it has iDRAC8 controller. I can try to get a "dd" dump from the MMC. Is that what you mean under the "nand dump"?

@willard Hi. Seems I missed something here, what have you done to get this “the structure wasn’t actually written to the emmc, but was written to the sdcard”.
Where did all this partitions on the SD card come from? I got the similar layout on my SD card after taking the MMC dump by using dd.
Did you try to boot from this SD card by changing a “root” value of the “bootargs” variable in UBBoot?


That’s exactly what I mean :slight_smile:


Finally, I was able to stop booting. But I had to give a resistor on I_emmc_debug then idrac starts.

I have motherboard R320.

There is an unprotect all command in the idrac7 console but that doesn’t seem to help. As for the discrepancy in the RC value you pretty much hit the nail on the head. I don’t have the internal documentation for any of this and am just gathering information and results hoping for an epiphany over time.

I’ve been inactive on this for the last week in part because there are two distinct motherboards I have failures on (both failed with the same updates but in different methods of delivery). In any case the one I’m working on now I’ve gleaned from other articles that I may be able to update the surrounding firmware packages and because I can get into the bios firmware recovery screen I’m exhausting that avenue before going farther.

I also just received another working r420 yesterday. I’m going to use that to observe a working system without updates, and then swap the motherboard out with the new one and work on recovering the one that won’t even let me into bios.


Finally, I have my jatagulator but I’m waiting for my buspirate before trying to get into the emmc debug port since I haven’t found any good information on that.

In looking at the link I should clarify that I can rewrite the uspi bios all day long with minimal (clip related) issues. It helped immeasurably by ordering a good wire wrap 3M clip, rather than using the cheap black clip that came with my programmer. I had to press that hard straight down to get it to bite, where this new 3m clip requires little or no jiggling to get it in the right contacts to read and write the bios.

I’m convinced it’s the EMMC image that’s out of sync with the uboot in the bios chip. I do boot up and see the settings (stored in the bios chip) are specifically for version 1.66.65 bios and I’m reasonably sure the issue originally happened trying to flash a newer LC integrated bios to the on board flash chip. And the final question I didn’t answer yet, now I didn’t use a ASProgrammer but the ch341a gold programmer seems to be working well enough for the uboot images (of which I have about 6 now).

@filet187 I may not have clarified this in my posts but I was buying those super cheap 2-3 dollar ch341 usb ttl adapters from aliexpress that have the 3v settings, but yeah if you’re using a 5v rs232 connection you’d want a resister to avoid frying anything. Before I connected anything I checked the voltage and the 3v seemed like the right setting to use (I have a few versions they all have a way to jumper the board side to 3 or 5 v).

In my case on the r420 the dbg port was 3v. Besides I’d rather fry a cheap rs232 board than fry the motherboard so if I were in doubt I’d try the 3v adapter before assuming it was 5v, but good on you for using the resistor.

@ldv so to be honest I need to be better and putting my test notes inside my pastebin to make that more obvious. Going off my memory (a dicey proposition) I thought I used the idrac7 console to manually write the emmc. Somehow I must have targeted the wrong device, because all I did to get it to dump those partitions from the pastebin was restart the server after the wipe locked up drac.

I have limited time at work to work on this, and I can’t sneak in the tests because those darn idrac fans sound like a jet engine. Now I have my new server at home I’ll set up a bench around these tests and recreate the experiments with better pastebin documentation.

I’ll probably spend the first day or two gather observations from the working board simply because it’s a pain to pull the cpu to swap the motherboard.


I’ll also want to leave the original board working as a target for the buspirate stuff because if I can get it to dump the emmc then I’d want to write both that and the idrac flash to the broken board and see if it works that way.

So I have the same issue as you do @willard with a r720.

Failure writing blocks [1:196609] RC=0xffffffac

Was trying to figure out how @ldv make a dd copy of a good working emmc, since there seems to be no dd type commands in the idrac uart shell. Is there another header for emmc type uart shell?

I have a working r720 and another identical r720 with the idrac problems. wondering if i can dd the working emmc onto an sd and try to boot from the sd on the non working server, as was suggested being possible.

@frameshift18 1. I have a second server with working iDRAC so I ran dd on it in a UART shell.
2. My appologise to @filet187 , finally I could not be able to a dd dump from the R730 server. Really sorry for that.
3. Look what I found recently:

1. iDRAC’s NIC is not and can’t be turned on by the u-boot, but u-boot can use server’s NIC.
In my setup, all 4 ports were in team. To check this theory I excluded first port from the team and set it disabled in OS settings.

Annotation.png



2. On the next step, I ran following(please note that I didn’t specify the TFTP and HOST IP addresses. This, because once I ran this command with this parameters specified, they are stored in SPI automatically. Also, you can use setenv to set this
parameters before running tftpboot):

iDRAC7=> tftpboot firmimg.d7
Using sh_eth.0 device
TFTP from server 10.0.1.111; our IP address is 10.0.1.199
Filename ‘firmimg.d7’.
Load address: 0x81000000
Loading: 53 MB
Bytes transferred = 56019170 (356c8e2 hex)

2. You can see that firmware was loaded into the RAM at address 0x81000000. Ok, let’s see if fwu can check it:

iDRAC7=> fwu check 0x81000000
Checking image header CRC … OK
Checking platform env ID… OK
Checking kernel image CRC … OK
Checking rootfs image CRC … OK
Checking u-boot image CRC … OK
Skipping u-boot update … NO
Checking Platform image CRC . OK
Done!

WOW! I’m so excited!

3. Some more from fwu help:
fwu mmc [‘sd|emmc’]
- list current mmc device, or set the target mmc device
Ok, lets try it with our SD card, so first I choose it with following command

fwu mmc sd

and run…


iDRAC7=> fwu update 0x81000000

*** Updating Partition 1
Checking image header CRC … OK
Checking platform env ID… OK
Checking kernel image CRC … OK
Checking rootfs image CRC … OK
Checking u-boot image CRC … OK
Skipping u-boot update … NO
Checking Platform image CRC . OK
Copying kernel image … OK
Copying rootfs … OK
Copying u-boot1 to flash… OK
Copying u-boot2 to flash… OK
Copying u-boot to MMC… OK
Copying platform image … OK
Done!

WOW! looks promising…

4. Let’s see if it works on eMMC.
Switch it back
fwu mmc emmc

and run…
iDRAC7=> fwu update 0x81000000

Updating Partition 1
Checking image header CRC … OK
Checking platform env ID… OK
Checking kernel image CRC … OK
Checking rootfs image CRC … OK
Checking u-boot image CRC … OK
Skipping u-boot update … NO
Checking Platform image CRC . OK
Copying kernel image … mmc write failed
fail to write block to device
Fail
fail to update kernel image
fail to update image ***
iDRAC7=>

So now I guess I have only three options:
1. This is an eMMC failure and it should be replaced.
2. Maybe one of the “SW_IDRAC_DBG” jumpers is used to remove write protection from eMMC(like a switch on a side of the SD card)?
3. Or maybe anybody knows how to tell u-boot to load kernel from the SD card?

Sometimes it’s hard to separate what’s conjecture from what’s empirically tested, and also some people are leveraging router experience and adapting it to the lighthouse circuits on the dell motherboard. I’ve done a bit of both myself, but so far since the only thing that consistently works on mine is the uboot and not the emmc (which I’m pretty sure I’ve tragically overwritten at this point).

I also found reading other forums and blogs, that some people reference the serial debug cable, and some are actually using either Windows or Linux actually loaded on the server with idrac tools, so it could be one of those who were using the dd command. Or if you can get the emmc image to load, that’d have a functional enough version of embedded Linux you could probably use dd. In my case all that’s corrupted or erased, so I just have the uboot which really is just a boot strapper from what I can see.

My best bet right now is to use the other port marked emmc debug, but I poked around on it with an oscilloscope and it had one pin that had a crazy looking stepped sine. I’m wondering if that isn’t the jtag io which from what I can discern is some crazy high speed bit stuff.

I’m not about to figure out on my own, and for whatever reason the domestic source for the jtag didn’t have the version board I wanted, so I ended up ordering it from overseas (along with several cable assemblies) so I’m still waiting for half that hardware to show up.

I do still plan to see what options I have with a known working emmc once I get some time to focus on one thing for a few hours.

One of the first things I’m going to do in the home lab is consolidate my notes so far into a pastebin journal, then going forward each test I do I’ll document both the intent of the experiment, along with logs and observations. That way later I can hopefully gather it all together and if I ever find something that works I can go back and retrace my steps to recreate it.

My ultimate goal would be to then reflash the wrong bios just to re-corrupt it, and then repeat whatever ‘fix’ I come up with to recover it with the eventual goal of streamlining it down to a non invasive process that doesn’t require me to solder jumpers onto debug ports.

1. tftpboot firmimg.d7
2. go 0x81000000
3. bootm

(0x81000000 is the address where tftpboot load the fw)

Enjoy!
But now I 100% convinced that EMMC is bad, and this is the root of the problem




[SH7757 ~]$ dd if=/dev/mmcblk1 of=/dev/mmcblk0
sh_mmcif: Cmd(d’18) err
mmcblk0: retrying using single block read
Mar 20 11:34:20sh_mmcif: Cmd(d’17) err
(none) kernel: mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
sh_mmcif: Cmd(d’end_request: I/O error, dev mmcblk0, sector 32
18) err
Mar 20 11:34:20 (none) kernel:sh_mmcif: Cmd(d’17) err
mmcblk0: retryimmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
ng using single end_request: I/O error, dev mmcblk0, sector 33
block read
Mar 20 11:34:20 (none) kernsh_mmcif: Cmd(d’17) err
el: sh_mmcif: Cmmmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
d(d’17) err
Maend_request: I/O error, dev mmcblk0, sector 34
r 20 11:34:20 (none) kernel: mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
Mar 20 11:34:20 (none) kernel: end_request: I/O error, dev mmcblk0, sector 32
Mar 20 11:34:20 (none) kernel: sh_sh_mmcif: Cmd(d’17) err
mmcif: Cmd(d’17) err
Mar 20 11:34:20 (mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
none) kernel: mmend_request: I/O error, dev mmcblk0, sector 35
cblk0: error -84 sending read/write command, response 0x0, card status 0xd40
Mar 20 11:34:20 (none) kernel: end_request: I/O error, dev mmcblk0, sector 33
Mar 20 11:34:20 (none) kesh_mmcif: Cmd(d’17) err
rnel: sh_mmcif: Cmd(d’17) err
Mar 20 1mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
1:34:20 (none) kend_request: I/O error, dev mmcblk0, sector 36
ernel: mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
Mar 20 11:34:20 (none) kernel: end_request: I/O error, dev mmcblk0, sector 34
Mar 20 11:34:20 (none) kernel: sh_mmcif: Cmd(d’17) err
Mar 20 11:34:20 (none) kernel: mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
Mar 20 11:34:20 (none) kernel: end_request: I/O error, dev mmcblk0, sector 35
Mar 20 11:34:20 (none) kernel: sh_mmcif: Cmd(d’17) err
sh_mmcif: Cmd(d’mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
17) err
Mar 20end_request: I/O error, dev mmcblk0, sector 37
11:34:20 (none) kernel: mmcblk0sh_mmcif: Cmd(d’17) err
mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40

Mar 20 11:34:2end_request: I/O error, dev mmcblk0, sector 38
0 (none) kernel: end_request: I/O error, dev mmcblk0, sector 36
Mar 20 11:34:20 (none) kernel: sh_mmcif: Cmd(d’17) err
Mar 20 11:34:20 (none) kernel: mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
Mar 20 11:34:20 (none) kernel: end_request: I/O error, dev mmcblk0, sector 37
Mar 20 11:34:20 (none) kernelsh_mmcif: Cmd(d’17) err
: sh_mmcif: Cmd(mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
d’17) err
Mar end_request: I/O error, dev mmcblk0, sector 39
20 11:34:20 (none) kernel: mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
Mar 20 11:34:20 (none) kernel: end_request: I/O error, dev mmcblk0, sector 38
Mar 20 11:34:20 (none) kernel: sh_mmcif: Cmd(d’17) err
Mar 20 11:34:20 (none) kernel: mmcblk0: error -84 sending read/write command, response 0x0, card status 0xd40
Mar 20 11:34:20 (none) kernel: end_request: I/O error, dev mmcblk0, sector 39
dd: writing ‘/dev/mmcblk0’: Input/output error
33+0 records in
32+0 records out
16384 bytes (16.0KB) copied, 0.291646 seconds, 54.9KB/s
[SH7757 ~]$

IMG_2105.jpeg

So to elaborate on my previous post, I hooked up the com port on my new dell server off ebay (pastebin XqFcZ1GR). It does boot up to a small linux distro. I haven’t done a lot yet, now I want to get the network online so I can use tftp/ftp and otherwise try to mess with that server without (hopefully) corrupting it too.

What you and I have been struggling with is we’re already in a broken uboot environment, and we don’t get all that extra emmc functionality. With this new test bed I have that same shell the others with a ‘partially broken’ idrac have.

Still this is encouraging to me, and there’s a whole new are for me to explore now. I wish stuff didn’t take so long to ship from china.


Wait, you got idrac to boot without emmc??? Am I reading this right?

Yes you are! )) But because the OS is loaded from RAM it works only until the next reboot. But be careful! Yesterday I bricked it again. What I did, was trying to upgrade FW. Don’t ask me why I decided to takeaway 32Gb SD card and put a 1Gb SD card instead. SD card size should be the same size as the EMMC or more. So when I ran fwu sd and then fwu update it somehow erased the SPI flash and now there’s no u-boot anymore. My guess is that when you run fwu update, it creates partition layout on SD card and then copies U-boot.bin to it and then to SPI. May be I’m wrong, but otherwise I don’t understand why there are u-boot partitions on SD/EMMC If you run util flashinfo you can see all this partitions and their content.
So I need to reprogram the SPI memory using external chip programmer. At least this time I now how to bring the iDRAC to work) And this time I’m going to try to replace the eMMC chip too.

Thats amazing!!! If I understand correctly, since the idrac is its own CPU/OS, once we get it loaded it will stay in the background as long as the system is plugged in? So that means no more crazy fan noise, no more 15-minute-long boot/reboot times! We can even shut down the main system (Windows), and idrac will keep working as long as the power cable is plugged in. If this is indeed the case then this is HUGE! It makes the system useable again!