Missing amdgpu firmware for AMD Renoir APU's

Hello all, sharing this in case it can help someone else and see if anyone has the same issues as I consider adding an issue to the issue tracker. I may gloss over some items but feel free to ask for clarification if it helps you and I’ll try and answer back.

Observed Behavior:
When using qubes on a AMD Ryzen based laptop with a renoir APU I am unable to:

  • use HDMI to connect to an external monitor
  • Adjust brightness of the display (using the hot keys shows brightness widget on screen and changes, but actual brightness does not, writing manual values to /sys area for backlight control does not work either)

Then I tried to update to kernel-latest for dom0 and it pulled a 5.11.x version of the kernel. After that the machine would no longer boot properly and seemed to hang before it would get to specifying the luks password.

Current Conclusion:
After testing (which I will omit specifics on unless someone wants to know), I determined that in my case it was not actually hanging on boot. When the initial boot was handing over the video from the EFI framebuffer to the next stage it just was not able to properly load the graphics, system was still running.

I was able to determine that in this case it was missing amdgpu firmware in the initramfs for the renoir apu’s.

Current Workaround:
I was able to get things to work by doing the following.

  • Downloaded an updated version of linux-firmware package that contained firmware for the renoir apu (linux-firmware-20210208). I used the 20210208 version since it was the closest release version to the release date for kernel 5.11.x as shown by kernel’s git tags.
  • Extracted the renoir related firmware (under amdgpu directory, files that start with renoir_ )
  • Copied those files to /lib/firmware/amdgpu directory in dom0 and made sure they had same permissions as others in that directory. (This is a test machine that will be wiped so accepted the risk of moving files into dom0).
  • Regenerated the initramfs image for the kernel 5.11.x version I installed (kernel-latest) using dracut.
  • Replaced the initramfs for kernel 5.11.x in /boot and /boot/efi/EFI/qubes
  • Made sure the machine was set to boot that 5.11.x kernel and restored the default kernel options (I had changed back to older kernel to be able to boot back into qubes and had some other setting for debugging) and then rebooted.

After that I was able to boot, got the graphical luks prompt and could log into qubes as normal. I then was able to adjust my screen brightness using the computer hotkeys and connect an external monitor with HDMI.

My Hardware:
HP Envy x360, model 15m-eu0013dx I think, has AMD Ryzen 5 processor (5500U) with the integrated AMD graphics (renoir).
I am running just latest updated 4.0.x release of qubes (other than the changes noted above), using UEFI, and I had to remove the noexitboot and mapbs options from BOOTX64.cfg file to be able to install (to get the installer to come up).

Longer Term Fix:
I need to find some time and collect up details and open an issue with qubes, if they pull in a sufficiently new linux-firmware package (or otherwise add the renoir_* files) when they generate their version for qubes then this will work without these manual steps.

Disclaimers:
I did not do systematic testing, I just did what I needed to get my use case working, obviously you may run into other issues and there could be side effects I have not seen.

Also I did try other kernel versions manually between 5.8 and 5.11 and the older 5.8+ versions before 5.11 seemed to report a kernel bug on boot, but since 5.11 worked once I did this I have not gone back and checked if they would work still, might be two separate issues (although it could affect folks if they are trying to run the default kernel and not kernel-latest and qubes updates to one of those versions).

Some possibly related post:
In this post there is talk of support for AMD VFIO and issue seemed similar, the “Failed to start Setup virtual
console” error I had noticed as well and chased that for a bit until I realized it was not the issue.

4 Likes

This seems to make the case for an update of the linux-firmware dom0 package (which is already more recent than the official fc25 one). Looks like it the repo for this would be GitHub - QubesOS/qubes-linux-firmware - for 4.0 maybe this would have to be made on a branch, at least the CI definitions seem to be targeting 4.1 only on master.

I think you should create the issue and add the additional info as you go along.

A good number of us are using or trying to use Qubes on Ryzen systems. My own experience last year with my Thinkpad T14 AMD was that custom building a 4.1 image with kernel 5.8 was necessary to get it working at all. Some of the biggest issues were addressed via updating UEFI/BIOS and installing the xorg_x11_drv_amdgpu package.

@tasket I ended up just making an issue after I posted this yesterday since I had a few minutes.
The link is: Request including amdgpu firmware for AMD Renoir APUs in linux-firmware package available from qubes repos · Issue #6703 · QubesOS/qubes-issues · GitHub

Thanks for info about your experience, I have been lucky in that I think the renoir firmware coupled with 5.11.x kernel for 4.0.x appears so far to be working.

@yann I saw the same thing about the linux-firmware package and the master branch actually pulls a new enough version from upstream to include renoir, just not the 4 branch yet. Although I have not tried that version of files yet to see if the same (I had pulled a never version for my workaround).

@frinkahedron I went on and upgraded the linux-firmware package to the 20200316-106 currently in the 4.1 repo. That does not appear to be sufficient, but then with no serial console it’s just working in the dark (see Getting a serial console on USB-only Qubes laptop). How were you able to get the diagnostic in the first place ?

Hi @yann, below is some info on what I did to get debugging info, but before that just wanted to check if you tried/verified the following:

  • Did you switch to kernel version 11.x or higher for your testing (the kernel-latest at the time I tried it)? I had to use that newer kernel and and remedy the new modules to fix my issue.
  • Did you generate a new initramfs for the kernel you were using? Installing the linux-firmware will regenerate an initramfs but it will be for the kernel you are currently running by default so you would have to manually generate one for the desired kernel if not the same version (and copy it to the right location to replace the old, etc).
  • Did you verify the iniramfs that was generated has the files as intended (I used the lsinitd command to list the contents to verify.

Back to your actual question though, The following describes what I did as I was trying to debug, I am skipping some steps and might have forgotten something I did so let me know if any specific questions on how I did anything or if anything does not make sense.

First Set of Attempts
I edited the /boot/efi/EFI/qubes/xen.cfg file and changed the parameters for the desired kernel version.

  • Removed rhgb and quiet parameters to get more verbose output on screen
  • Added rd.shell option so if root files system did not work it would allow it to go to a shell.
  • Added rd.debug so that output data would be written to the systemd journal if possible
  • Added rd.break=cmdline made the initramfs drop to a shell before bringing up the root file system, there are other options for the break points see dracut’s man pages. I tried different ones but ultimately it did not help me much.
  • Added rd.cmdline=ask gave me a prompt where I could try other options without having to keep booting installer and dropping to a shell to edit the xen.cfg, etc.
  • I think I also set log_buf_len=1M to increase printk buffer

Some of these may have not done much, and ultimately for me was not the direct way I got the data I used to persist but perhaps in your case it would help, it did let me drop to shell and confirm later I did not have the needed renior firmware in the initramfs image that booted.

The problem I had was I still seemingly hung at same point and since the hang came before the system was unlocked it never persisted the log info, etc.

Ultimately Getting full output of boot to persist
So next I made an assumption that I may not actually be hung, that it may just be the display freezes since it does not initialize properly. So what I did was let it boot to where it seems to hang, waited about 30 seconds and then typed in my luks password as if I was looking at the luks password screen and then hit enter.

Then I waited a couple minutes to let the system boot (making the assumption it was booting) and then after I felt it would have made it to the qubes login in normal case I hard booted off.

Next I booted with installer, dropped to a shell and edited the xen.cfg file to use the old working kernel so that it would boot normally with video.

Now once I logged into qubes, opened a dom0 shell and viewed the systemd journal the output of the previous boot was there in the history since I had successfully unlocked the root file system in the previous attempt which meant the log data got written to the root file system. I was able to locate the right boot sequence by verifying the run with the updated kernel string, etc.

That output for me was the output that let me figure out that the missing firmware was my issue as I could see some output in the boot output that pointed to that case.

Hope some of that was useful and made sense, at some point when I find some time maybe I’ll write up a more detailed step by step and verify I can duplicate again. If that would be useful let me know.

2 Likes

Thanks for the details, the rd.cmdline=ask for example is harder to find in the dracut docs :slight_smile:

I had tested with both 5.10 and 5.12 with the same result.

It turns out I had another problem in addition to the missing Renoir firmware: it is the discrete 5500M which causes it. From the rd.break=cmdline shell, I can clearly see that loading amdgpu shows a couple of errors starting with “failed to allocate kernel bo”, and causes a reboot after a few seconds. At first it also complained of missing navi14_ta.bin, but adding just this one did not help, maybe updating the whole lot would help, I’ll check that later.

I was able to get 5.12.9 to boot by preventing amdgpu to claim the discrete GPU (pci-stub.ids=1002:7340 for this device), and I confirm the Renoir already works better than with 5.4, at least the driver works enough to use the 144Hz modes. However it still complains about amdgpu/renoir_ta.bin lacking, so a firmware package more recent than the one currently in 4.1 will apparently be needed for a fully operational GPU.

I tried to use just the amdgpu/renoir_ta.bin file from latest firmware git, and to import the whole amdgpu directory from the same version, and the driver still complains it cannot load that file. I find this is a bit strange, but as this seems to be the HDCP firmware it should not be of great consequence.

OTOH if we want to get GPU acceleration we’ll also need to update xorg-x11-drv-ati.

Now looking into building Qubes to provide PR’s for all of this.

I’m having the same problem as @frinkahedron : HDMI port is not functional and keys to adjust screen brightness don’t work. I’ve been looking all over this forum and I’m glad to finally know what the problem is, namely my Ryzen 7 4700U cpu!

I’m going to try to follow the steps in the workaround suggested by OP and hope I don’t brick my system. This is a beyond my current skill level but I really want my external monitor to function.

I’m going to do this step by step and would really appreciate it if somebody could correct me if I appear to be taking a wrong step. I don’t mind if it takes a long time to complete this process.

So, as far as I understand the first step I should take is run sudo dnf install kernel-latest in a dom0 terminal?

My current kernel is 5.4.129-1.fc25.qubes.x86_64 in case that matters.

@ryuu if you’re on 4.0, you should first install a more recent firwmare, or you’ll get a black screen and will have to recover from this by booting from external media to select a 5.4 kernel. But even after upgrading to 4.1 my external HDMI does not report a screen being plugged. Maybe the firmware in 4.1 is still too old ?

You mean I should first follow these steps before upgrading the kernel? :

Is so, I’ve verified the archive, extracted the renoir files and can move them to /lib/firmware/amdgpu in dom0.

I don’t know how to complete the next step however:

  • Regenerated the initramfs image for the kernel 5.11.x version I installed (kernel-latest) using dracut.

And I don’t know at which step I should install the kernel-latest package…

  1. yes you should install the firmware before kernel-latest, or you will probably get your system to boot with a black screen.

  2. you can extract manually, or get the .rpm from the QubesOS 4.1 repo, you may find it easier to install, and it proved sufficient for me

  3. dracut --regenerate-all seems to do the work (although I had not found about that one at the time and hard to specify by hand the version and initramfs path - but then, I’m on 4.1 now so YMMV)

1 Like

I really appreciate your help, but I just want to make sure I’m not bricking my system here.

So, first step is to move those ‘renoir’ files I got from the amdgpu directory in the linux-firmware-20210208 archive to the /lib/firmware/amdgpu directory in dom0?

Second step is to run sudo dnf install kernel-latest in dom0?

Third step is to reboot the machine? Or first run dracut --regenerate-all in dom0?

To give you some context I don’t know what initramfs is…

It is always a good idea anyway to make sure you have a bootable rescue device in case something goes wrong. Here any linux USB key would be sufficient: if you need to go back you just have to mount the first partition and edit the xen.cfg file to set as “default” the kernel to be booted. You can first check how things are in /boot/efi/EFI/qubes/xen.cfg (note the mountpoint is /boot/efi/), before and after installing kernel-latest, so you can feel more confident of being able to recover if needed.

Correct order is regenerating before rebooting.

1 Like

I followed the steps as best I could, rebooted the machine, and… it just works! PERFECTLY! I’m never going back now Qubes is amazing (when you get it working)

Thank you so much @yann :grinning:

1 Like

@frinkahedron, just out of curiosity, can your AMD Renoir machine successfully resume from S3 sleep?

I’ve got some machines containing Ryzen 7 4800U CPUs with Renoir, and have a feeling that it’s related to your issue….

Hi @alzer89, I actually never tried so I just went and tried the log out → suspend option and it does not appear to fully suspend and it gets stuck on a blank screen (but my keyboard backlight still responds).

So may not be working, I did notice that on this page there is some mention of a kernel option when this behavior is seen (screen turns off and won’t turn back on).

So in short a suspend does not seem to work at least on my machine where I had applied the steps described here. (I am still running 4.0.x)

Anything interesting in your system journal (sudo journalctl)?