Ryzen 7000 serie

sadly already tried all of that

Was a bit frustrated, but have not tried yet to use a Windows VM on the new computer, could help to really rule out driver issues. Since AMD known there is bugs in the linux kernel, maybe they fixed some things in the windows kernel

Update 1: Windows VM doesn’t work as well.
Update 2: my error seems linked to bios parameter “above 4g” (at least it crash in a different way)
I swear, this bios is satan’s work. Everything break in unexpected way, settings some options require hard resetting the bios, if you set a parameter either it is not respected, break something completely unrelated, or, time to time, do what it is supposed to do. Take 5 minutes between each reboot for just the bios page to show up. I am becoming crazy.

By modifying “SRVIO”, “Above 4G”, and “Resizable BAR” in the bios, I can reach differents errors. Some buffer overflow in the linux kernel and other things, going to take a lot of time to understand what is going on. Host crashing if using less than X amount of ram etc.

Update: Removed this patch qemu: fix TOLUD for PCI passthrough by mati7337 · Pull Request #44 · QubesOS/qubes-vmm-xen-stubdom-linux · GitHub and manually modified rootfs.
Now the host is crashing reliably no matter the amount of ram provided to the VM. Don’t know if I am closer to a solution or not. This host crash append when the VM is trying to load the gpu drivers (same behavior for windows vm and for linux vm )

1 Like

New try.
Based on what I tested, I am left only with a motherboard / bios issue. I cannot see how it could be any other things.
However, there is too much possibilities to have any credible chance of success by blindly trying bios parameter.
With my latest configuration, everything seems to work until the guest VM try to load the driver (windows vm or linux vm, it doesn’t matter).

Maybe the GPU is not in a state it can receive instruction ?
So let’s compare the state of the GPU just before doing the gpu passthrough, on my new computer and on my old computer.
To do that I will list the value of all files of /sys/bus/pci/devices/ID_OF_THE_GPU.0/ , and /sys/bus/pci/devices/ID_OF_THE_GPU.1/
( If anyone have a better way / tools to display the information of GPU PCI device (without loading its driver), don’t hesitate to tell me)

New computer, GPU itself (.0):

  • aer_dev_correctable: every value are equals to 0, TOTAL_ERR_COR 0
  • aer_dev_fatal: same
  • aer_dev_nonfatal: same
  • ari_enabled: 1
  • boot_vga: 0
  • broken_parity_status: 0
  • class: 0x030000
  • config: (it is binary data, so instead, writing md5sum: 31fcb27e49505711aae05bfdedc1b4ea )
  • consistent_dma_mask_bits: 32
  • consumer:pci:XXXXXX : (irrelevant)
  • current_link_speed: 8.0 GT/s PCIe
  • current_link_width: 16
  • d3cold_allowed: 1
  • device: 0x67df
  • dma_mask_bits: 32
  • driver: pciback
  • driver_override: (null)
  • enable: 0
  • firmware_node: (irrelevant ?)
  • irq: 24
  • link - clkpm : 0
  • link - l1_1_pcipm : 0
  • link - l1_aspm : 1
  • local_cpulist: 0-15
  • local_cpus: ffff
  • max_link_speed: 8.0 GT/s
  • max_link_width: 16
  • modalias: pci:v00001002d000067DFsv00001043sd00008877bc03sc00i00
  • msi_bus: 1
  • numa_node: -1
  • power - autosuspend_delay_ms: Input/output error
  • power - runtime_active_time: 2070000
  • power - control: on
  • power - runtime_status: active
  • power - runtime_suspended_time: 0
  • power - wakeup: disabled
  • power - wakeup_* : (everything is empty)
  • power_state: D0
  • reset_method: bus
  • revision: 0xe7
  • vendor: 0x1002
  • subsystem_device: 0x8877
  • subsystem_vendor: 0x1043

old computer (only noting different values):

  • config: (it is binary data, md5sum: 9afd7…)
  • current_link_width: 4 (nothing surprising here, plugged it in the first slot available)
  • irq: 50
  • local_cpulist: 0-7
  • local_cpus: ff
  • modalias: pci:v00001002d000067DFsv00001043sd00000525bc03sc00i00
  • power - runtime_active_time: 692657
  • subsystem_device: 0x0525

So nothing that look interesting, need to find another idea

2 Likes

So you believe it could be an issue with your specific PC? I could try testing the .iso on my Zephyrus, it may help narrow down the issue if you had a few more testers.

edit: oh right, the whole point is getting passthrough working.

If I were you, I’d register an account on the level1techs forum, there’s lots of people there who are interested in (and experienced with) passhtrough there.

I was able to passthrough a Nvidia 1070 with the new computer.
( The NVIDIA driver seems to not like pci=nomsi, if I remove it, everything work correctly with nvidia. Settings NVreg_EnableMSI=0 does not seem to be necessary)

So it is very strange that I am unable to passthrough the RX580.

I am going to reboot few time to confirm that I can always reliably passthrough the nvidia 1070.

@Cpotts
I will link you a R4.2 iso later this week

@fjdh
good idea, for the moment I just posted that on reddit Zen4 - RX580 - Xen : VFIO

Update:

  • Passthrough of 1070 work reliably

Going to try all of that: drm/amdgpu AMDgpu driver — The Linux Kernel documentation

2 Likes

There might also be signed weekly iso builds for 4.2 just around the corner:

/me refresh Index of /qubes/iso/ a third time … :wink:

Successfull GPU Passthrough of the RX 580 on my new computer !!!

The last step ( I don’t know if the other one were required ) was to … downgrade linux kernel back to 5.4 LTS
So there is another issue in recent version of the linux kernel, that I will need to bisect

So now, will try to remove as much modifications I did as I can ( to check if only the linux kernel downgrade is required ( from my previous tests I expect that it require more than just kernel downgrade ) ).

Currently the passthrough conditions are:

  • Using a old kernel (Currently 5.9)
  • not passingthrough the audio part of the GPU
  • In the bios: “Resizable BAR support” must be disabled
  • In the bios: “CSM Support” must be disabled

Something interesting, on my old computer, I needed to use the boot parameter pci=nomsi to passthrough the RX 580 on kernel >= 5.7. This seems not required with the new computer, it work fine until 5.10.

I now need to find the required boot parameter to use a recent kernel, and bisect the change in the amdgpu driver introduced for kernel 5.10.

Also anyone have a theory on why I can’t passthrough the audio part of the GPU, but only on my new computer ?

8 Likes

Wow, you have enable an entirely new tier of hardware to work with Qubes OS. Bravo! It just goes to show how dedicated this community is to Qubes and security.

1 Like

Congrats. :slight_smile:
This is on xen 4.17/18? So 1070 works, rx 580 doesn’t, on k 5.10.x or any newer kernel? Again, I’d ask around on l1t forum, specifically the user gnif did a lot of work on getting passthrough to work in 2020/2021. He might have an idea whether there were changes relating to this in that interval.

thanks :slight_smile:
but it will still require many weeks of work. I dodged many issues but now they need to be fixed.

@fjdh
It is on xen 4.17, using a ISO I compiled from the qubes sources with 2 custom patches ( one is the CPU frequency fix, the other one is to remove the TOLUD patch).
1070 work. RX 580 work only with kernel below 5.10. It is one of the issue I need to find the root cause and fix.
And thanks, will send a more detailled post on l1t forum once the known qubes specific bugs are fixed

I reformatted my new computer with a iso where I integrated this patch (qemu: fix TOLUD for PCI passthrough by mati7337 · Pull Request #44 · QubesOS/qubes-vmm-xen-stubdom-linux · GitHub) and redid the same steps. Passthrough doesn’t work.
I think this patch is buggy, but quite hard to understand what is wrong with it. Since it seems to work at least on some hardware.
To be sure I will reformat qubes os on my new computer using the iso without the TOLUD patch and retry all the steps I noted to check if I correctly succeceed the passthrough.

Many of the issues I am encoutering seems to be qubes specific (MSI support / MSI-X support).

Some note on this TOLUD issue.
First, I tried with the patch (as it is already integrated in qubes).

  • Wifi worked
  • GPU passthrough didn’t worked.

Then I tried without this patch

  • Wifi didn’t work (reset bug in dom0 kernel and guest kernel; flr timeout)
  • Reached my first success with GPU passthrough

Today 1: reinstall with the patch

  • Wifi work
  • GPU passthrough don’t work

Today 2: reinstall without the patch

  • Wifi don’t work
  • GPU passthrough work

Will do another try with the patch, before opening a complicated github issue

3 Likes

Recompiled an ISO with up to date source. Wifi work and gpu passthrough work (with the same issues as before), idk what was the issue. Either hardware trolling me, or me being tired, probably a bit of both.

Passthrough also work with a RTX 4080

If you want to try, the iso I used is available here ISO
sha1sum: 28d1098feed6ae25b42a079c1c34b05d3be116c2
I didn’t compiled the template inside, you will need to download a fedora 37 template from QubesOS repo to install it manually and start to test.

2 Likes

Great, thanks! Would your ISO work with all other hardware with which official release would (Intel iGPU only, for example)?

Yes, it is basically qubes os development tree

1 Like

Awesome, thanks. I’ll definitely try it on my USB stick first!

Updated my gpu passthrough doc: Qubes OS article

Things that doesn’t work yet on my new hardware:

  • The wifi is very weird. It was working. Then I booted from a usb key to a linux archlinux for some testing. Then rebooted into qubes. FLR reset issue, never able to make it work. Rebooted few time. Same issue. Finally I completly cut the power for few seconds, started the computer, and it now work. I am NOT using the “fast boot” option in the bios
  • Ethernet card. It suffer from deconnexion. PCI passthrough is not stable, sometime the guest kernel say that the PCI link have been lost.
  • IGPU passthrough. Tried to passthrough the IGPU to make a sys-gui-gpu, but the guest complay that it cannot find the bios. In the host, for /sys/devices/pci*/**/THE_IGPU_ID/ , there is not rom file.

However, when I start the computer, the bios inform me that the rom of the IGPU and ethernet card can be found in some memory address. Below the exact message:

"
0x0000:0x09:0x00.0x0: ROM: 0xf000 bytes at 0x6b99d018
0x0000:0x12:0x00.0x0: ROM: 0xae00 bytes at 0x6a159018
"

0x09 is the ethernet card
0x12 is the IGPU

I do not know what to do with this data yet. I guess it would be nice if somehow I can extract thoses memory region to retrieve the ROM of both devices. Don’t know how to do that yet.

Also when I dump the vbios of the nvidia card from within the guest (so not the real vbios, some stubs compiled for qemu), it show it have been created for seabios (not UEFI), and the igpu specifically can only work with UEFI. Don’t know how important it is in the context of being inside a qemu guest. But in last resort could try to not use seabios and force the use of UEFI bios like tianocore/ovmf/edk2

What kernel version/patch are you using within the netVM? Also make sure that only a single HVM has the PCI device selected.

Tested version 5.15 and 6.1.6. And checked for the HVM / PCI devices.

For the IGPU after some chat on IRC i have been directed to potential issue with hvmloader and BAR allocation

Hey all, I thought I would reply because I have this new hardware and would love to get qubes running on it. I completely understand that very new hardware takes a long time to be supported but I am happy to help to test! The following are my specs and the errors I am getting:

Here are my general specs:

CPU Mode:  Ryzen 9 7950X
Motherboard Model: ASUS TUF GAMING X670E-PLUS WIFI (Newest 0820 Bios version)
Discrete GPU: AMD Radeon RX 7900 XTX
Internal GPU: Video Card:  AMD Raphael - Internal GPU [ASUS] (would like to put dom0 on this if possible)
NvME to install on: WD Blue SN570

Errors and oddities (before finding this thread):

Putting the BIOS in CSM, Other OS mode causes only my discrete GPU to be detected as a display, so if I want to boot the USB in non-UEFI mode, I can only do it connected to the 7900XTX. I don’t know why I cannot boot in compatibility mode and use the internal graphics (for the bios screen, at least).

When I boot in the CSM / other os I can boot the non-UEFI version of the bootloader and get to the screen with the 3 white dots, but the installer never proceeds past that screen. It is hanging on something in the setup process.

If I boot the system without using the CSM, I can boot to the built-in graphics on the CPU. I can only select to boot in UEFI mode though on the USB stick. When selecting to install qubes in verbose mode I get errors something like this…(sorry didn’t take a snapshot so from memory):

XHCI USB ports issues

but it gets past that…

nvme nvme0: I/O 786 QID 1 polling, timeout
nvme nvme1: I/O 538 QID 1 polling, timeout
nvme nvme0: Abort status 0x0

it will hang here for hours with no further progress

I am surprised I got this far with such new hardware using the current version of Qubes, but it seems like there are a few problems with this new motherboard and CPU, and once past that it will probably work great. I am wondering if having multiple nvme drives on the motherboard might be an issue, or maybe the x670-e motherboard has a new way of handling USB and PCIe / nvme that Qubes does not like yet.

Last note, I tried the “WORKING.ISO” on this thread just to see if it would change anything but I have the same issues with that one. Thanks to everyone who is working on getting this working for a future release of Qubes.

1 Like

Hello :slight_smile:

  • The Integrated GPU cannot work in non UEFI mode. You NEED to disable CSM. It is a hardware limitation.
  • The “XHCI USB ports issues” is because the implementation of x2apic is incorrect. You have no issue with your nvme. When you reach the bootloader/grub of the installer, choose the line you want, press “E” then add “x2apic=false” to both the Xen line and linux line. Then press F10
  • The current official qubes version cannot work with this hardware, only the development tree can. ( And the WORKING iso )

You should have everything needed to finish the installation.

( I didn’t make this ISO “user friendly”, manual installation of template and kernel are required after the installation. Also note that the IGPU only work with kernel >= 6.1 )

1 Like

Wow thank you so much for the useful information! I am going to try WORKING.ISO again with the bootloader fixes you suggested.