[Guide] How to Modify a Polaris (Radeon RX Series) GPU

Since you mentioned “mobile drivers” in previous message - i assume you meant OEM drivers that are posted at dell website. Those are outdated and i never used them.

All the time i actually used official Adrenalin (20.4.1/2) drivers which are the same drivers i guess you would install on desktop GPU.

Problem came with 21.6.1 (if I’m not wrong about version) when the FX9830P graphics driver was moved to legacy, and is no longer included in new Adrenalin versions.
So with official driver i can install only one of the GPUs because official driver installer cannot install two different driver versions, although at AMD they have this feature in driver autodetect utility and it is broken - what it does it installs APU driver and then installs RX460 driver in same path overwriting APU driver which ends up in everything broken. I even reported this and after some back and forth with the support team they kept insisting i use dell driverd from 2018!!!
Amernime have the possibility to use different kernel drivers at the same time, i managed to run it, apps start slow and there appear to be some bugs, but no opengl improvements. I used latest 23.4.1 version.

During hardware repair/upgrade i set memory voltage at 1.44V, used the same timings i use now, at 2000mhz i was getting millions of EDC errors, they kept increasing at insane rate like 30000/s, but a one minute furmark test did never crash. After changing voltage to 1.62V i have 0 errors at 2000mhz, few of them appear above 2050mhz, unless the VRM hits OCP then there is e.g. 300000 errors at once and system halts instantly.

I looked at the files, they are almost identical and have same voltageobject info as the first one you sent.
The voltage type being set appears to be VDDC according to driver source. But then come the unsigned16 index/data pairs where data bytes are:
72 (0.8357)
1C (1.375)
23 (1.33125)
3 (1.53125)
7E (0.7625)
20 (1.35)
4 (1.525)

I would say these are SVI2 voltage IDs and most of these values appear to be memory related, the type indicating VDDC might be there simply because controller is primarily managing VDDC, however the two low values could be setting VDDC or VDDCI as well. Hard to tell by just guessing, also i could be wrong

@karmic_koala the only things there that look like they could be memory related is 23, 3, 20, and 4. They would be VDD/VDDQ voltages. Entries 1C, 23 and 20 are probably relating to K4G80325FB-HC25 when spec requires them to be rated for 7000, while entries 3 and 4 are probably for when K4G80325FB-HC25 is to run at the rated 8000. Entries 72 and 7E are probably essentially idle (desktop) voltages. I’m just guessing here while looking at some datasheets but that’s what looks to make sense. Hopefully that clears a few things up your end. Here is a vBIOS dumped from the 470 I have so you can compare them, both cards are as close to being twins as you can get one is an 8GB Sapphire Nitro RX470 the other the same just in 480 flavour. The biggest difference is that the 470 uses an NCP81022 voltage controller.
RX470 Nitro+ OC BIOS.rar (104.4 KB)

As for the drivers, yeah I was talking about the OEM ones. Whatever that other GPU is will cause issues as whatever it is it’ll be based on the VLIW I think it was called architecture.

Almost identical i2c object with only value 23 missing. That would be a sign this is driver/protocol specific rather than some proprietary controller data since two different controllers operate with same data set.

72 and 7E are way too low even for idle GDDR5 operation. This must be something else

@karmic_koala I don’t think 0.83v is too low for idle clocks, without a second display GDDR5 only runs at 300MHz on the desktop you won’t need a lot of voltage for that at all. Entry 72 thinking about it could also be the minimum allowed voltage state for the GPU. I never looked specifically but do know there is a lower limit you can’t ordinarily go below, 7E might be for VDDIO. The one thing I can say for certain is that memory voltage isn’t controllable in any software I’ve seen even Wattman gets this wrong even though there is a memory voltage option adjusting it actually only increases VDDC :crazy_face: Igore from Igores Labs might know regarding memory voltage on 4/500 series cards if there is a vBIOS entry for it.

I’ve seen reports it isn’t enough. Another guy repairing his card had about 1V there due to mismatching resistors replaced before by previous owner.
He was getting code 43, after changing voltage up to 1.35V the card started properly.

This table might be it, but without DMM confirmation it’s difficult to tell which values are used.
You could try changing value 3 to 0, this should set 1.55V - max within protocol specification and see if you can get better OC

@karmic_koala unless the Samsung memory is high binned stuff undervolted to 1.35v (minor side note - if it is this would possibly explain why EDCs occur on every Samsung equipped Polaris card I’ve tested even at stock) then it should be running entry 3 or 4 by default anyway… unless it’s another mistake made in Polaris vBIOS, there’s a lot of those, almost like everything was rushed and hacked together on the programming side of things. I should be able to switch back to the 480 later I just have to put the final finishing touches on the 470 before calling it a day with that card unless some sort of miracle breakthrough happens… which low and behold as I type this I think of something, have we tried tried searching for information to do with Samsung in the vBIOS itself? I mean by searching for the Samsung part number either in regular text or in Hex, you’d think that information is in there. Might give some insights.

Just in case anyone is wondering, yes, even the formerly beat up 480 I got the Samsung memory clocks exactly the same on that as every other card with Samsung I’ve tried (various 470s, 480s, and 580s), 2050MHz. So there is definitely some kind of hard wall, a voltage limitation or Polaris and Samsung really hate each other that much.

Without measuring voltage it’s all just speculation. Have you ever ran into hard crash while overclocking memory? Or you only get EDC errors ?
What about other card where you got 2200mhz stable?

Looking for part strings most likely will not find anything other than vram info table

@karmic_koala other cards I got 2200+ stable on weren’t Samsung. What I thought cracked the barrier on one of the Samsung cards turned out to be the driver doing something really weird, a DDU and driver reinstall later the EDCs were back. I’ve only got very occasional hard crashes when working with GPUs, always where I was pushing what I knew the limits were. Anyway that RX480 I got is now dead, well, probably dead. I was running a benchmark with the card totally stock and suddenly the entire system shut down and wouldn’t even power on until after a CMOS clear. Debug display got stuck on VGA, removed the card, couldn’t see anything physically wrong so took it to another system and still no display so yup, this card is quite dead. Considering the state it was in I’d put money on an issue with the power delivery.

Yeah the drivers are in error when applying clocks, sometimes performance drops significantly, sometimes only a bit. Until you restart system (on laptop there is powerdown PX feature so it is enough to restart app that utilizes GPU)

Try to use another card as display and hotplug the failing one after system boot to see if it’s detected.

@karmic_koala yeah I noticed the drivers do weird things but only with cards using Samsung. Other cards you can change clocks and voltages, just hit apply and things work as they should. You can accomplish the same with Samsung cards but I found only if you change DAT_DLY, DQS_DLY, and OEN_DLY from 7 to 9. God knows whats up with that. I would see if the card is detected but I really don’t think it’s even worth that much effort the fans won’t even spin even though zero RPM is disabled. Only signs of life is that the Sapphire logo lights up still.

Not on my system. Same with samsung as it was with hynix. First change of power settings is usually fine, try change it once more the performance already drops, and only reseting GPU helps to restore performance… i guess it’s just like guy on guru3d wrote “amd drivers are a dog with five legs”

@karmic_koala Wattman is generally a buggy POS definitely something to avoid but there isn’t much decent OC software around these days. Afterburner wouldn’t be so bad if it still wasn’t so clunky to use it literally has one skin that’s fairly streamlined but still not ideal too many “pull-out” windows.

Buggy - yes, crashes occasionally - yes. But the same problem existed with other tools as well meaning it is actual driver fault.
I used afterburner until figured out how to use wattman on mobile system, i didn’t like it creates additional registry overrides so i stopped using it. Overdriventool at least has more simple interface where i don’t need to move a bunch of sliders and knobs

@karmic_koala Yeah OverdriveNtool is pretty good but it too does some things wrong like the memory voltage option, it just increases VDDC like Wattman. I’d like to get my hands on an RX590 still to play around with but I don’t know if it’s worth doing… maybe have it replace the 470 in a backup system so it isn’t just collecting dust but I’ve kinda moved on from Polaris now, fun cards to play with but their lifetime is coming to an end now, another 2 years or so before they are completely obsolete taking them to 7-8 years lifetime so they did really well. It’ll never happen but I’d still like to see Polaris respun by AMD on a 5/7nm process, upgrade the memory to GDDR5X or preferably GDDR6 and offer them as a budget RT-less gaming option. I bet they would sell like hotcakes I’m convinced with relatively minor design changes Polaris still has a lot to offer.

Thank you for your tutorial. I have a question, will modifying the VDDCI voltage with PBE take effect? My graphics card has K4G80325FC memory, and after using PBE’s automatic timing, the memory can be overclocked to 2160MHz without screen errors. At this time, the bandwidth reached 221-223GB in the OCL test. Time Spy score increased from 4500 to 4700 (Core 1400MHZ@1080mv), I am going to try the more compact timing in the post next, and see how much performance I can squeeze out of Polaris.

@ket
Do you know if the hotspot temperature is actually used on polaris?
Today i changed cooling materials on VRAM and mosfets/inductors, passed several short furmark/valley tests at 1200/2000 mhz (20 minutes each) and then tried 1300/2000 mhz with core at 1.1V (during bench never exceeded 1.13V)
If i have blower on the bottom side directed at VRM the benchmark goes on, turned it off and within 1-2 minutes system hangs. At this point all the temperatures were below 70C.
So I’m not sure what’s overheating.
Also, with core at 1200/0.95V SODIMM, CPU and GPU can reach 75C and keep running.

@wzyjsk55555 221-223GB/s isn’t actually very good with my timings you’ll get that kind of bandwidth barely over stock frequencies. You need to download RTSS (Rivatuner Statistics Server) and HWinfo64, using those tools you can in realtime measure if you are getting memory EDCs (Error Detection and Correction) collisions. Generally speaking, even one EDC is too many if you want absolute stability and the GPU not wasting cycles processing those EDCs.

@karmic_koala I don’t think the hotspot temp is used on Polaris, it’s actually set lower than the maximum temp value but I wouldn’t be surprised if this is another error in the vBIOS so I’ve always set the max temp value to the same as the hotspot value, 105c.

Not even max worked here, was set to 95c or so, GPU thermal diode reached 99c and system powered itself off then. Maybe it was set in EC or elsewhere to 100c.

I know what causes false overheat halts

This is definitely it.

Last night i tried gaming with core at 1400mhz 1.22V and mem 1900mhz. Every attempt to load game with mem 2000mhz failed. Furmark can load, but it only use about 100mb of memory that’s why it can work even above 2000mhz.
Game use about 3GB of memory and there simply isn’t enough power to start it.

1300/2000 profile tested in game, DECK16, 24players, highest graphic settings - the performance was worse than at 1300/1900

Since 1300/2000 loads up but 1400/2000 not, perhaps going from 1300 to 1400 on core could have an impact on performance, once the VRAM bandwidth is high and stable enough.

Played a bit less than an hour at 1400/1900, set 1.220V - mostly ran at around 1.24V max 1.26V, max current reached 97A / 117W, 0 mem errors, no crashes, all temps below 70C except SODIMM reached 75C

I tried TheVic1600’s timing, but I only got 226GB of bandwidth at 2160MHZ. For K4G80325FC, do you have any recommended timing?