[Problem] Dell R720xd iDRAC BIOS Recovery

SOIC8 test clip cable is for BIOS chips that are soldered on, not for DIP BIOS. So once you have one you wont need to desolder chips.

This thread has been interesting, but on my r420 what I’m finding is that the idrac spi chip may indicate that the real meat and potatoes isn’t in the 25Q03213 chip at all (on my board that’s what it was).
I was able to read it with the gold CH341A, and it verified fine (after pressing down firmly a few times and using a magnifying glass to make sure both sides had pin contact).

I copied a unpatched chip, then my new patched chip. Off the bat the 0-FFFFF portion of both of those is identical.

In the next section starting at 100000 all the configuration was identical, except for 4 mac addresses.

These settings in particular make me wonder if the actual firmware isn’t stored on /dev/mmcblk0p2 (the sd card inside the server)?
hostname=idrac7
bootnfs=tftp 82000000 $(bootfile); run nfsargs; bootm
runtime_args=mode=vhard reset_cause=ac nmi_buf=0x83000000
fwudev=evb
bootargs=root=/dev/mmcblk0p2 rootwait rw rootfstype=squashfs mem=239616k console=ttyS2,115200 <NULL> mac1=00:00:00:00:00:01 mac2=00:00:00:00:00:02 mode=vhard reset_cause=ac nmi_buf=0x83000000 quiet
fwu_kernel_start=1
fwu_rootfs_start=8001
fwu_platform_start=120006
fwu_uboot_start=3fc01
fwu_kernel_size=135d
fwu_rootfs_size=19330
fwummc=emmc
ver=U-Boot 2009.08 (Jun 03 2013 - 03:47:19) Avocent (0.0.3) EVB

Has anyone actually unbricked the idrac with the bios only flash approach? My next guess is to image the sdcard and go to those offsets and see if I can find the same bytes in the firmimg.d7 file (and if so then I’ll try copying the sdcard out of the working dell).

How about the jtag port, have any of you had luck getting into that or the SPI_DBG port instead of the regular rs232 redirect?

This appears to be the emmc, as it doesn’t change even if I pull out all the sd cards.

That said, I broke down and bought a new mb from ebay, copied it’s chip, then updated it completely, and recopied it’s chip.

I then took that and flashed it back to my broken server that wouldn’t enter recovery mode and at least now I have the flashing yellow light.

I soldered the tty pins and connected with putty, and I was happy to see this:
######################################################################
** No bootable iDRAC image is found
(System Health & ID LED is flashing amber at ~1/2 second rate).
To Recover iDRAC via an SD card.
1) Format SD using FAT on a Windows box or EXT3 using Linux.
2) Copy ‘firmimg.d7’ to root path.
3) Insert SD card.
4) System Health & ID button solid amber durning recover.
5) Both boot paths are flashed.
######################################################################

Polling for SD card state change

However I can’t find any firmimg.d7 that will work.
I’ve tried 1.31.30, 1.35.35, 1.40.40, 1.65.65, 1.66.65, and 2.10.10.10 and they all pretty much do something like this:
Found ‘/firmimg.d7’ in ‘FAT’ file system


UTIL RECOVER:Transport:sd TargetMMC:EMMC File:/firmimg.d7
reading /firmimg.d7

95344358 bytes read
UTIL RECOVER:SD load passed from FAT fs.
UTIL RECOVER:Transport time [sec:mil]: 16:525

Clear OS images in partition/s.
Clear kernelN, rootfsN, ubootN
’EMMC’ blk size=[0x200][512] Erase to 0xffffffff
Mem buf size=[0x06000000][100663296] Total bytes=[0x08000000][134217728]
blocks[0x1:0x40001][1:262145]
mmc write failed000] Buf[0x88000000]…fill buffer…erase/write
util flash:
Failure writing blocks [1:196609] RC=0xffffffac
/nRecover returned failure, RC=0xffffffff
Firmimg.d7 might be bad.
Polling for SD card state change

The upgrade path for the ebay server indicated the original idrac upgrade was from 1.40.40 to 1.65.65, so I expected 1.40.40 to work. When it finally got to this state I was able to get into F2 (but not do anything), at least I could see the settings were 1.65.65, but alas that doesn’t work any better.

Now I’m back to being stumped. I suppose I’ll try flashing the post updated chip and then see if it takes 1.65.65…

Hi,
Last time when I wrote here I said that I was able to program iDRAC BIOS. After that it’s indicator started to blink with an amber light. By that time I didn’t know that this indicates “pooling for MMC card” status. Only after I had a UART adapter connected I found out what’s going on behind the scene. First, I was glad, and thought that I’m one-step from the solution. But finally all my experiments are ended up exactely as yours. No matter what firmimg.d7 file version tried.
What looks confuzing for me, is that string, where it says “Firmimg.d7 might be bad”, which comes right after "Failure writing blocks [1:196609] RC=0xffffffac
/nRecover returned failure, RC=0xffffffff"
Here a qustion arise in my head: Either a “firmimg.d7” has to be some sort of “recovery version” which is not a part of a regular firmware package and should be downloaded hell knows where from. Or this is an eMMC chip failure.

Polling for SD card state change <br />SD card state change, previous=0x700 new=0x780


######################################################################
** No bootable iDRAC image is found
(System Health & ID LED is flashing amber at ~1/2 second rate).
To Recover iDRAC via an SD card.
1) Format SD using FAT on a Windows box or EXT3 using Linux.
2) Copy ‘firmimg.d7’ to root path.
3) Insert SD card.
4) System Health & ID button solid amber durning recover.
5) Both boot paths are flashed.
######################################################################


Found ‘/firmimg.d7’ in ‘FAT’ file system


UTIL RECOVER:Transport:sd TargetMMC:EMMC File:/firmimg.d7
reading /firmimg.d7

62953181 bytes read
UTIL RECOVER:SD load passed from FAT fs.
UTIL RECOVER:Transport time [sec:mil]: 10:987

Clear OS images in partition/s.
Clear kernelN, rootfsN, ubootN
’EMMC’ blk size=[0x200][512] Erase to 0xffffffff
Mem buf size=[0x06000000][100663296] Total bytes=[0x08000000][134217728]
blocks[0x1:0x40001][1:262145]
mmc write failed000] Buf[0x88000000]…fill buffer…erase/write
util flash:
Failure writing blocks [1:196609] RC=0xffffffac
/nRecover returned failure, RC=0xffffffff
Firmimg.d7 might be bad.
Polling for SD card state change \

I would have thought so too, but what are the odds both our healthy systems would have failed in exactly the same spot. Plus, googling that I found another person recently posted that same error to pastebin. I confirmed the crc on the download from Dell, so I know my firmware isn’t corrupt. it does seem the emmc fails to write though.

I reflashed the post-updated chip to my other server and it’s still doing the same thing, I did see something in the bios that’s probably something I need to consider.

iDRAC Settings version…1.65.65.04
iDRAC Firmware Version…2.61.60 (Build 8)

Maybe I should try giving it 2.61.60.

This particular server killed itself when I connected the lifecycle controller to the internet and told it to just download all the updates from ftp.dell.com

Which bugs me to no end, but that aside, it clearly downloaded a 2.61.60 build that I haven’t come across yet…I’m not sure why I didn’t see that when I was at work but I guess I’ll try that tomorrow. But maybe the problem is that the settings don’t match the bios.

In that case I’ll make sure to update the other server to 2.61.60 and then dump the idrac and write that…they sure make it hard to back out of this situation.

I also gave up my work when I reached the "Failure writing blocks" roadblock. Pretty sure the EMMC is bad. How realistic/difficult would replacing the EMMC be?

Well, that’s might be a problem of settings version and firmware version compatibility. But why should it be so? I mean, isn’t the flash utility should just erase and reset everything? If I understand this correctly, both firmware and settings are located in eMMC chip. There should be another option which is much interesting for me - the ability to interrupt U-Boot process. If I understand what guys says, there is a possibility to load iDRAC OS to RAM. Maybe after this it will be available fo firmware update.
Take a look here https://amp.reddit.com/r/homelab/comment…idrac_recovery/

It was strange to me that this is a drac8 download, but it does say for idrac7 (The firmware was iDRAC8/7 with Lifecycle Controller Version 2.61.60.60), and the filename was firmimg.d7 but the outcome was the same:

Found ‘/firmimg.d7’ in ‘FAT’ file system


UTIL RECOVER:Transport:sd TargetMMC:EMMC File:/firmimg.d7
reading /firmimg.d7

109270246 bytes read
UTIL RECOVER:SD load passed from FAT fs.
UTIL RECOVER:Transport time [sec:mil]: 18:918

Clear OS images in partition/s.
Clear kernelN, rootfsN, ubootN
’EMMC’ blk size=[0x200][512] Erase to 0xffffffff
Mem buf size=[0x06000000][100663296] Total bytes=[0x08000000][134217728]
blocks[0x1:0x40001][1:262145]

Blocks [0x1:0x30000] Buf[0x88000000]…fill buffer…erase/write
mmc write failed

util flash:
Failure writing blocks [1:196609] RC=0xffffffac
/nRecover returned failure, RC=0xffffffff
Firmimg.d7 might be bad.

I think the failure at block 1 (if I’m even reading that right) is the part that gets me. I wonder if there isn’t a write enable issue that has the emmc locked or something like that.

I downloaded the datasheet. It only uses 30 pins that I can see. 5 of those are core ground, and 5 are core voltage. I would presume if you could interrupt either voltage or ground to the chip it would render it inert. Then you could conceivably patch a new on onto those pins (but supply ground and voltage).

If I ever did that I’d do it with a socket so I would only have to do it once. The only real value would be to break chips in a way you can easily remove them and put them into another circuit to mount them in another filesystem for direct access.

Honestly though, I doubt I’ll get to it. You could also probably get a high speed carbide bit and just grind the old chip off (wear goggles but I’m mostly kidding).

hynix1.PNG


hynix2.PNG

I think I’ve read that before and tend to discard the idea due past failures along a similar line. I didn’t have my tty console at the time, so maybe I’ll try it again, but I suspect that changing the variables directly and inserting them into the flash and writing it (as I’ve tried) doesn’t work because it changes the CRC.

As for the J47 jumper, my jumpers in that area all seem to be around J25, so although there may be a similar function fining it (if it even exists on a 420) is another matter.

I think there are two candidates I’ll mess with by the EMMC there’s a switch diagram though it’s not really labelled, and there’s a j25 header.

EMMC-SW and J25.jpg



I’m also intrigued by the sw2 block, and the JTAG ports. If I get some spare cash I’d like to buy a jtagulator and see if I can get into those.

SW2-CPLD_JTAG_BMC_JTAG.jpg

So the sw header turns out to be interesting. The two by the emmc didn’t do anything, nor did sw1. sw2 however did do this:

U-Boot 2009.08-00088-g121cddc (Nov 17 2014 - 05:50:46) Avocent (0.0.3) EVB, Build: jenkins-idrac-yocto-release-505

CPU: SH-4A
BOARD: R0P7757LC00xxRL (C0 step) board
BOOT: Secure, HRK not generated
DRAM: 240MB
(240MB of 256MB total DRAM is available on U-Boot)
ENV: Using primary env area.
In: serial
Out: serial
Err: serial
PCIe: Bridge loaded with 0x18000 bytes
WDT2: Booted Lower Vector, 'uboot1’
sh_mmcif: 0, sh-sdhi: 1
Net: sh_eth.0, sh_g_eth.0
INFO: 00:002 Start-up -to- util_idrac_main()
INFO: 00:004 U-Boot 2009.08-00088-g121cddc (Nov 17 2014 - 05:50:46) Avocent (0.0.3) EVB
INFO: 00:009 U-Boot checkin date(05-10-2013) Version(1.0.183)
INFO: 00:005 iDRAC PPID <NULL>
INFO: 00:003 SPI NOR init 4096 KiB AT25DF321A bus=0 cs=0, speed=1000000, mode=3
INFO: 00:008 SH-4A Product: Major Ver=0x31 Minor Ver=0x14 C4 Little endian
Family=0x10 Major Ver=0x30 Minor Ver=0x0b
PASS: 00:015 Dedicated monolithic mgmt NIC enabled (vendor mode override)
INFO: 00:131 BCM54610 OUI=0x00d897 Model=0x26 Revision=0x0a PhyAddr=1
INFO: 00:167 SD CARD: Device: sh-sdhi Manufacturer ID: 12 OEM: 3455
Name: SD
INFO: 00:472 EMMC: Device: sh_mmcif Manufacturer ID: 90 OEM: 14a
Name: HYNIX Tran Speed: 25000000 Rd Block Len: 512
MMC version 4.0 High Capacity: Yes Capacity: 0
INFO: 00:019 CPLD: Major Ver=0x1 Minor Ver=0x0 Maint Ver=0x4
Planar: Type=0x09 Rev=0x4 Rework=0x0 Scratch/PathRetry=0x00
PASS: 00:013 Coin cell detected good, AD=0x39c low water=0x2c1
PASS: 00:008 PCIe C0 Ver=0.15 MCTP en, CRC=0x8e9b6875 @0x8efbf954 cnt=0x18000
INFO: 00:007 Init PCIe mailbox(PCIe 0xFFEE0150=0x40010000)
INFO: 00:006 mode=vhard
Erasing SPI flash at 0x100000…Writing to SPI flash…done
Erasing SPI flash at 0x110000…Writing to SPI flash…done
PASS: 01:236 Booted Lower Vector, ‘uboot1’ wdt2cnt=0
INFO: 00:005 wdt0cnt=0
PASS: 00:003 Clear CH1/CH2, clear 4K shared memory@0xffcaa000 on AC power up
PASS: 00:007 SMR0 no sermux env, default 0xd4
INFO: 00:005 GRACR=0x3c HISEL=0x00 SIRQCR5_D=0x03 SIRQCR6_D=0x01 LADMSK0=0xff2
MRSTCR0=0xfedffe7f MRSTCR1=0xfff3ff0f MRSTCR2=0x7f80feff
BARMAP=0x1 BCR=0x85000000 NCER=0x01fc NCMCR=0x0006 NCCSR=0x0303
PASS: 00:021 etherc0=90:B1:1C:54:43:89
getherc0=90:B1:1C:54:43:8A
INFO: 00:009 Fan logic for monolithic planar type 9
fan1 - def 0000 1fff 3d fan2 - def 0000 1fff 3d
fan3 - def 0000 1fff 3d fan4 - def 0000 1fff 3d
fan5 - def 0000 1fff 3d fan6 - def 0000 1fff 3d
fan7 - def 0000 1fff 3d fan8 - def 0000 1fff 3d
INFO: 00:076 Env and backup CRC’ed ok
*** no text signature found ***
INFO: 00:649 Sync eMMC/SPI NOR/Alternate u-boot images
PASS: 00:259 Current u-boot1 1.0.183 verified with ‘ubootN’
Trailer Struct - Missing start token, exp=0xc0de1111 rec=0x0
U-boot2 in sync with u-boot1 1.0.183
FAIL: 00:219 Verify OS Images N: Kernel crc exp=0xca4431c0 rec=0x26c73dbc
blk_start=0x1 blk_size=0x1360 ENV bcnt=0x26b937
PASS: 00:257 Boot device=emmc Boot partition5/N-1
Boot Path Retry:P1/N=3 P5/N-1=0 swapping to P5/N-1
INFO: 00:012 Sync eMMC/SPI NOR/Alternate u-boot images
PASS: 00:258 Current u-boot1 1.0.183 verified with ‘ubootN1’
Trailer Struct - Missing start token, exp=0xc0de1111 rec=0x0
U-boot2 in sync with u-boot1 1.0.183
FAIL: 00:437 Verify OS Images N-1: Kernel crc exp=0x8c86b2b rec=0x3059b451
blk_start=0x40002 blk_size=0x1360 ENV bcnt=0x26d5d5
FAIL: 00:013 Boot device=emmc Boot partition5/N-1
Boot Path Retry:P1/N=3 P5/N-1=3 P5/N-1 now expired
INFO: 03:408 Dedicated NIC has no link, switch to use NCSI
INFO: 00:005 No ‘userexe’ defined
INFO: 00:000 07:752
Hit any key to stop autoboot: 3
WDT2: Disable in abortboot()
OSWDT: Disable in abortboot()
CPLD_BMCRDY: Enable BMC_MIN_RDY in abortboot(). Prevent BIOS reset.

NOTE: After stopping u-boot in this development mode. You may need to
warm/cold reset the server when booting iDRAC manually as BIOS
may have already viewed iDRAC as unresponsive.



RECOVER:Max retries occured for both N/N-1 paths, OR forced recover.

iDRAC7=>

Now I guess I need to figure out what I can do with that prompt.

I’m so excited. This is like, amazing. Even if it’s still bricked, how cool.

iDRAC7=> mmcinfo
Device: sh_mmcif
Manufacturer ID: 90
OEM: 14a
Name: HYNIX
Tran Speed: 25000000
Rd Block Len: 512
MMC version 4.0
High Capacity: Yes
Capacity: 0
Bus Width: 4-bit

then I bumbled around and figured this out:
iDRAC7=> protect off all


Then more bumbling and I figured this out:
iiDRAC7=> util recover -emmc -from sd -f firmimg.d7


UTIL RECOVER:Transport:sd TargetMMC:EMMC File:firmimg.d7
reading firmimg.d7

109270246 bytes read
UTIL RECOVER:SD load passed from FAT fs.
UTIL RECOVER:Transport time [sec:mil]: 18:917

Clear OS images in partition/s.
Clear kernelN, rootfsN, ubootN
’EMMC’ blk size=[0x200][512] Erase to 0xffffffff
Mem buf size=[0x06000000][100663296] Total bytes=[0x08000000][134217728]
blocks[0x1:0x40001][1:262145]

Blocks [0x1:0x30000] Buf[0x88000000]…fill buffer…erase/write
mmc write failed

util flash:
Failure writing blocks [1:196609] RC=0xffffffac
UTIL PASS
iDRAC7=>

I’m not clear on where to go from there though…So I unplugged it and plugged it back in, but it put me right back to this:
Found ‘/firmimg.d7’ in ‘FAT’ file system


UTIL RECOVER:Transport:sd TargetMMC:EMMC File:/firmimg.d7
reading /firmimg.d7

109270246 bytes read
UTIL RECOVER:SD load passed from FAT fs.
UTIL RECOVER:Transport time [sec:mil]: 18:918

Clear OS images in partition/s.
Clear kernelN, rootfsN, ubootN
’EMMC’ blk size=[0x200][512] Erase to 0xffffffff
Mem buf size=[0x06000000][100663296] Total bytes=[0x08000000][134217728]
blocks[0x1:0x40001][1:262145]
mmc write failed000] Buf[0x88000000]…fill buffer…erase/write
util flash:
Failure writing blocks [1:196609] RC=0xffffffac
/nRecover returned failure, RC=0xffffffff
Firmimg.d7 might be bad.

I’ll keep dinking with it and see if I can get it to boot into memory like you suggested. I’ll probably even solder a switch onto sw2 since I can see I’ll be doing that for awhile and the paperclip is a bit awkward.

So yeah I wasted way too much time today messing with this. I do like that interactive shell, and at least i can turn down the fans now, but I reread that reddit thread and I think I’ve come to the same dead end.

This comment at the very end:
Read eMMC via J_EMMC_DBG on board. — it works. I reversed the pinout of this debug interface and I dumped a working iDRAC image [provide later]. However, When I try to format or erase emmc, I got a failure. I changed power supply to the emmc and still no luck. Tried to utilize MB PSU, but I cant find a way to offline the CPU of iDRAC.

He got in thorugh the emmc_dbg and similar to my util erase attempts he was still unable to format or erase the emmc, which doesn’t seem right.

I would like to know how he used the emmc_dbg to dump the running firmimg file.

I can’t figure out where is the “J25” on my R620 ((
There is a “sw_idrac_dbg” which looks similar to your "sw2"
But it is not clear to me should I put jumpers to all its contact pairs on just to one of them?

As for EMMC dump, there might be another way… I have a second R620 server now. It has iDRAC7 working properly. Today I connected to it using UART interface. When booted, it gave me a Linux shell. Then I inserted SD card and found that it was enumerated as /dev/mmcblk1 device and on-board EMMC as /dev/mmcblk0(I forgot to take a screenshot). So I ran dd if=/dev/mmcblk0 of=/dev/mmcblk1 and after a few minutes the process completed successfully. I’d like to believe that this is actually a whole dump of the EMMC. I can share it with you if you need. Because if the dump is correct and you succeed to interrupt U-Boot, theoretically you should be able tell it to continue to load not from the EMMC but from the SD card. And maybe(and it will be just what we need) you can call dd command and in clone the SD card to EMMC.

One more interesting point:

INFO: 00:309 SD CARD: Device: sh-sdhi Manufacturer ID: 41 OEM: 3432
Name: SD1GB Tran Speed: 25000000 Rd Block Len: 512
SD version 2.0 High Capacity: No Capacity: 1023934464
INFO: 00:486 EMMC: Device: sh_mmcif Manufacturer ID: 90 OEM: 14a
Name: HYNIX Tran Speed: 25000000 Rd Block Len: 512
MMC version 4.0 High Capacity: Yes Capacity: 0

Either MMC capacity cannot be determined correctly, or it just tells us that MMC is bad. I’m going to check this log on a working server…

On working iDRAC, MMC capcity is also 0. So this is not a problem.

f16t3828p66534n2_ZOpczwUI.jpg

So for troubleshooting the sw2 I figured as stupid as Dell sometimes seems to me, I can’t believe they’d have a kill switch on the board labelled sw, so on that basis it should be safe to (carefully) jumper them one at a time.

The first thing I noticed was jumpering the first pair caused the device to reboot.

The second thing I tried was the second pair gave me the interrupt boot option, at which point I stopped trying the rest because that’s pretty much what I was looking for.

I’m not convinced the emmc is bad. Even though the protect off all doesn’t unlock the emcc I think it’s more likely something else. I think it’s just too strange that searching for “Failure writing blocks [0:196608] RC=0xffffffac” actually returns hits from another person who’s had the exact same response, and out of those of use outside of an NDA trying to unbrick our own stuff how big can that sample base really be?

Also even though I don’t know for sure if [0:196608] means failed to write to part or all of that range…if it does why would this work:
iDRAC7=> mmc write 1 0x8100000 0 0x30000

MMC write: dev # 1, block # 0, count 196608 … 196608 blocks written: OK

(doesn’t that essentially write [0:196608], albeit with random stuff from whatever happens to occupy 0x8100000)
To have two of us with the same error in the same spot? I dunno, I just don’t think it’s the emcc at this point.

I do think I probably managed to erase or muddle the image so much that the hopes of dd are gone, though at first I think that probably would have helped (had I known then what I know now).

To be fair, I had two of these and I bought another motherboard for $70, so nowadays how much time is this really worth (it even came with an enterprise drac).

That said, I just cant seem to put this down.

Whether it goes straight to DRAC7=> or whether you have to hit a key seems to be based upon how valid your spi data is.

I’ve flashed every copy I can find and in some cases there’s no response at all (totally dead), in other cases I get this maddening orange flashing but no recovery mode. On a couple of the firmwares (the ones with the newer LC data loaded) I think the issue is it’s trying to hand off to the wrong version of the lifecycle controller.

…I’m cutting out some rambling stuff at the end and I’ll just conclude with yeah if you can extract anything from the emmc at the very least if I can figure out how to load it into memory I can try executing it (or even writing it) from there.

So this was interesting, though it had no improvement on the firmware.d7 step…

https://pastebin.com/f1VL3Vb1

The relative bit being (for the benefits of people who won’t follow external links):
INFO: 00:076 Env and backup CRC’ed ok
*** no MBR signature found
create default partitions
No. Type Status Start Size System
-----------------------------------------------------------
1 P 80 1 8000 16 MB Linux
2 P 00 8001 37c00 111 MB Empty
3 P 00 3fc01 400 512 KB Empty
4 E 00 40001 1bffff 895 MB Extended
5 L 00 40002 8000 16 MB Linux
6 L 00 48003 37c00 111 MB Empty
7 L 00 7fc04 400 512 KB Empty
8 L 00 80005 a0000 320 MB Empty
9 L 00 120006 2000 4 MB Empty
10 L 00 122007 2000 4 MB Empty
11 L 00 124008 2000 4 MB Empty
12 L 00 126009 2c1001 1 GB Linux
13 L 00 3e700b 17d79 47 MB Linux
14 L 00 3fed8b 3001 6 MB Linux
15 L 00 401d8d 12c001 600 MB Linux

1 Unknown Image

2 Unknown Image

3 Unknown Image

4 Unknown Image

5 Unknown Image

6 Unknown Image

7 Unknown Image

8 Unknown Image

9 Unknown Image

10 Unknown Image

11 Unknown Image

12 Unknown Image

13 Unknown Image

14 Unknown Image

15 Unknown Image

partition size changed
create default partitions ***

Now if I can just figure out how to write a known image into one of those partitions I think I’ve got something.

I’m starting to think there’s a design defect in the EMMC chips or iDrac firmware. Maybe they have a limited number of writes, and Dell’s firmware wrote on them too much? That’s the only explanation that I can think of, as googling around I’m seeing many people with similar issues recently. For some people the problems occurred when they tried to do an upgrade, but in my case the system was sitting in a closet for years without being touched and just one day starting having these problems after a reboot

Interesting threads from reddit, all within the last few months:

https://www.reddit.com/r/homelab/comment…covery/e3y1g4e/ <— interesting thread but even more interesting comment: “if U-Boot is working you can push a kernel into RAM and boot from RAM” ← how???

https://www.reddit.com/r/homelab/comment…oidrac/ebr90og/ <— looks like someone also with a bad EMMC

https://www.reddit.com/r/homelab/comment…720_idrac_help/ <— looks like another bad EMMC, attempting to resolder. lots of info on partition structure

https://www.reddit.com/r/homelab/comment…ory_and_future/

https://www.reddit.com/r/homelab/comment…ur_firmware_on/

I was going to go down the path of this server not being used much (it was a lab server, and I can vouch for it’s entire time in the cabinet)…but it comes back down to when I write to it directly I can’t get it to reproduce the write error, so I’m still not convinced. I watched a great video today on using jtag to dump a router memory chip. It sounded suspiciously like that other article I was referred to. I’m going to try to get into the emmc directly via the jtag headers and see if I can save the bin that way. If I can get the hang of reading and writing the chip directly (and assuming that doesn’t start giving me write errors) I’ll try copying one of the other good r420’s over.

The real take away I got from this thread is once you’re broken, you’re pretty much broken unless the sdcard recovery works. But to prevent breaking it there’s a very specific and tedious set of interim upgrades that you do in a specific order and then it doesn’t damge the server. For me I ended up bricking two out of four servers. I could go on another long diatribe here but I’ll spare everyone. #NotHappytho

As to how to push the kernel into RAM and boot, I haven’t figured it out yet I’m working on that too.

I’ll go ahead and read through all those now though I probably read them already, maybe I’ll see something new now I’ve taken a day or so off.

I asked the guy on the second reddit link if he’d give me the ‘detailed tftp instructions’ he sent the other guy. Regardless I guess I’ll try that tomorrow. If nothing else it’ll eliminate the possibility of some kind of bug with the sdcard procedure (or error in technique). Maybe it’s reading the fat with a different block size than windows 10 wants to use or something. If I get anything useful I’ll post it here.

I also ordered a bus blaster after watching Joe Grand do this: https://www.youtube.com/watch?v=IadnBUJAvks to a router. I bought his Jtagulator already, so hopefully between the two of them maybe I can get into the jtag port and copy the chip that way.

As you may have seen in my pastbin I have that same size 0 thing:
INFO: 00:472 EMMC: Device: sh_mmcif Manufacturer ID: 90 OEM: 14a
Name: HYNIX Tran Speed: 25000000 Rd Block Len: 512
MMC version 4.0 High Capacity: Yes Capacity: 0

It would seem to me then I should get errors writing to it but the small targeted writes in my link (before I crashed the system and restarted) didn’t give me errors.

I guess tomorrow I’ll try writing specific values and then test to make sure it’s actually writing the data, and not just doing a bad job reporting errors.

Well, JTAG could be interesting. Would be nice if you can share your experience then.
Meanwhile, I succed to innterrupt iDRAC boot on my R620.(Thanks to your advise) and now I’m waiting for “window” to make it on my production server.
As I mentioned befofe, I took MMC dump from working server using dd command. If you run printenv in u-boot console, among all variables, it should bring you something like that

bootargs=root=/dev/mmcblk0p2 rootwait rw rootfstype=squashfs mem=239616k console=ttyS2,115200 <NULL> …

I ran following commands to change /dev/mmcblk0p2 to /dev/mmcblk1p2(which is the partition name on SD card after dd has made a bit-by-bit copy)
set bootargs "root=/dev/mmcblk1p2 rootwait rw rootfstype=squashfs mem=239616k console=ttyS2,115200 <NULL> …

saveenv

Now if you run boot command it should start boot from SD card. I’m not 100% sure because I tried it on a working server. But, when I tried to pull-off SD card and ran boot cmmand, the console was flooded with errors(it was look like a loop). And when I pushed the card back into the reader, the boot process continued properly.

An interesting moment, is when you run saveenv command, it says that settings were written to SPI Flash. But after the reboot, bootargs values return to default. At the same timeI tried to change this way an ip for the TFTP server and it was stored permanently.

[15:42:51:400] Saving Environment to SPI Flash…
[15:42:51:414] Erasing SPI flash at 0x100000…Writing to SPI flash…done
[15:42:52:042] Erasing SPI flash at 0x110000…Writing to SPI flash…done


For thouse who whant to try to replace the eMMC, I would recommend to get some old junk PCB with a FBGA IC’s on it and try to pump-up their skills on a hot air desoldering/soldering process.
Also, after the whole info that I have read on this topic, MMC physical interface shoud be 100% compatible with an SD card.
As I understood threre are MMC cards fitted to standard SD card housing. But they are using 8bit data interface.
In our case MMC chip uses 4bit interface, exactly as as a regular SD card.

[14:49:38:936] iDRAC7=> mmcinfo 1
[14:49:40:635] Device: sh-sdhi
[14:49:40:636] Manufacturer ID: 3
[14:49:40:649] OEM: 5344
[14:49:40:649] Name: SM32G
[14:49:40:649] Tran Speed: 25000000
[14:49:40:649] Rd Block Len: 512
[14:49:40:649] SD version 2.0
[14:49:40:649] High Capacity: Yes
[14:49:40:649] Capacity: 1850212352
[14:49:40:649] Bus Width: 4-bit
[14:49:40:689] iDRAC7=> mmcinfo 0
[14:50:28:513] Device: sh_mmcif
[14:50:28:516] Manufacturer ID: 90
[14:50:28:529] OEM: 14a
[14:50:28:529] Name: HYNIX
[14:50:28:529] Tran Speed: 25000000
[14:50:28:529] Rd Block Len: 512
[14:50:28:529] MMC version 4.0
[14:50:28:529] High Capacity: Yes
[14:50:28:529] Capacity: 0
[14:50:28:529] Bus Width: 4-bit

I have seen some posts where guys are been able to solder eMMC to MMC card adapter using thin wires. Defintly it demands some good magnifier or may be even a microscope.
So may be, iDRAC’s eMMC can be replaced by regular SD card connected by wires to PCB.