Another 2. GPU passthrough post

Hello all,

my big dream to work completely on Qubes is within reach.
For the work I need programs that need to be accelerated on the hardware side.

Amd ryzen 7 5800x
rog strix x570-F gaming
amd radeon rx 580 (1. gpu)
amd radeon rx vega 64 (2. gpu)
2*32 GB ddr4 ram
1tb m2 ssd
If necessary I can provide more system specs.

Since the attempt to install qubes 4.0.3 failed because of a blackscreen at boot time I use qubes 4.1 (Signed Q4.1 alpha iso).

I would like to detach the 2nd GPU from dom0 and assign to a hvm.
I have read various guides and failed with all of them.
In detail I wonder what went wrong with https://groups.google.com/g/qubes-users/c/zHmaZ3dbus8/m/rWll-ywQCAAJ:

I first got the device id for my second GPU:

[user@dom0 ~]$ qvm-pci
...
dom0:0c_00.0   VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64]                 
dom0:0c_00.1   Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64]
...

Then edit grub: (I tried both parameters xen-pciback.hide=(0c:00.0)(0c:00.1) and rd.qubes.hide_pci=0c:00.0,0c:00.1) individually and together.)

[user@dom0 ~]$ sudo vim /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="gfxterm"
GRUB_CMDLINE_LINUX="rd.luks.uuid=luks-1b2813e0-e5fd-4dfb-84ad-400aa8f4504f rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles rd.driver.pre=btrfs rhgb quiet xen-pciback.hide=(0c:00.0)(0c:00.1) rd.qubes.hide_pci=0c:00.0,0c:00.1 modprobe=xen-pciback.passthrough=1 xen-pciback.permissive"
GRUB_DISABLE_RECOVERY="true"
GRUB_THEME="/boot/grub2/themes/qubes/theme.txt"
GRUB_CMDLINE_XEN_DEFAULT="console=none dom0_mem=min:1024M dom0_mem=max:4096M ucode=scan smt=off gnttab_max_frames=2048 gnttab_max_maptrack_frames=4096"
GRUB_DISABLE_OS_PROBER="true"
[user@dom0 ~]$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg
[user@dom0 ~]$ reboot

Since the guide uses an Intel processor, I can’t check this 1:1, but here is the output.
[user@dom0 ~]$ xl dmesg
xl-dmesg-output.log (9.0 KB)

So far so good, but then somehow only the vga device is assignable and not the sound device.
[user@dom0 ~]$ xl pci-assignable-list
0000:0c:00.0

I just went ahead and assigned the GPU (both 0c:00.0,0c:00.1 only taking 0c:00.0 results in a driver reset conflict) to the hvm via the qubes manager gui.
Nothing happens at startup, only that the computer slowly freezes. The qrexec seem to be broken. Here are the logs:

guest-gpu-pt-test.log (931.0 KB) guest-gpu-pt-test-dm.log (79.6 KB) qrexec-gpu-pt-test.log (885 Bytes)

I would really appreciate help, but I also know that you have a lot to do. That’s why I would be willing to donate $100 to Qubes if a solution works :slight_smile:

Edit: According to some guides I need to edit the hvm kernel, but I have no idea where to start…

4 Likes

Update:

I’m quite embarrassed to admit it, but after another night of debugging I realized that GRUB_CMDLINE_LINUX was not updated properly…

You can check the parameters with cat /proc/cmdline.
It was not
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
but
sudo grub2-mkconfig -o /boot/efi/EFI/qubes/grub.cfg
necessary. According to (R4.1: Restore Grub2 in UEFI mode · Issue #4902 · QubesOS/qubes-issues · GitHub) updating both is fine.

Now this is also displayed correctly:
[user@dom0 ~]$ xl pci-assignable-list
0000:0c:00.0
0000:0c:00.1

For memory less than 3 gb it works fine. Howerver assigning more memory results in
(2nd GPU passtrough, VM dies on boot. · Issue #5603 · QubesOS/qubes-issues · GitHub).

Rechecking the parameters reveals a passthrough error:
[root@dom0 qubes]# cat /sys/module/xen_pciback/parameters/hide
(0c:00.0)(0c:00.1)
[root@dom0 qubes]# cat /sys/module/xen_pciback/parameters/passthrough
N
[root@dom0 qubes]# cat /sys/modules/xen_pciback/parameters/permissive
Y

Changing modprobe=xen-pciback.passthrough=1 to xen-pciback.passthrough=1
leads to
[root@dom0 qubes]# cat /sys/module/xen_pciback/parameters/passthrough
Y

Now there is still stubdom to change as in (AppVM with GPU pass-through crashes when more than 3.5 GB (3584MB) of RAM is assigned to it · Issue #4321 · QubesOS/qubes-issues · GitHub).

But the location has moved in 4.1:
[user@dom0 stubroot]$ cp /usr/lib/xen/boot/stubdom-linux-rootfs stubroot/stubdom-linux-rootfs.gz
cp: cannot stat '/usr/lib/xen/boot/stubdom-linux-rootfs': No such file or directory

find / -name "*stubdom*"
… there are multiple solutions …
/usr/libexec/xen/boot/qemu-stubdom-linux-rootfs
/usr/lib46/xen/boot/stubdom-linux-rootfs

I took the first one and noticed that the second one was updated aswell.

[user@dom0 ~]$ cp /usr/libexec/xen/boot/qemu-stubdom-linux-rootfs stubroot/stubdom-linux-rootfs.gz
Also you have to change
SP=$'\x1b'
to
SP=$'\n'
in qubes 4.1.
finally change the last command of this guide to
sudo mv ../stubdom-linux-rootfs /usr/libexec/xen/boot/

But now I still got errors. On bootup I got
“Cannot connect to qrexec agent for 60 seconds, see /var/log/xen/console/guest-gpu-pt-test.log for details”

Here are all new logs.

guest-gpu-pt-test.log (1013.3 KB) guest-gpu-pt-test-dm.log (801.2 KB) guid.gpu-pt-test.log (98 Bytes) qrexec.gpu-pt-test.log (143 Bytes)

I would really like to get this working and appreciate any help:)

Update 2:
Installing amd drivers according to (Qubes OS article) and using a hvm with no kernel, results in a different error:
Instantly getting (and not after a minute or so) I got the message
“qrexec-daemon startup failed: 2021-02-15 18:59:50.259 qrexec-daemon[26084]: qrexec-daemon.c:134:sigchld_parent_handler: Connection to the VM failed”.

3 Likes

I cannot get Gpu passthrough working, too. Neiter in dom0 and in a fedora vm for example. Trying to attach it with the qubes manager then my laptop goes black and do a restart. Here is a screenshot of my pci devices. Can anybody explain me the right devices I need to attach to the vm? But first it must be work in dom0 or? Otherwise it is not possible in a vm? Sorry I don*t understand that really how to do it, but I try to learn;)
Maybe fepitre has soon the time to tell us how sys-gui-gpu is working for him with the nvidia 960 gtx card. That should be able to work at our systems as well and then finally we’ll get gpu passthrough working. The topic “gpu” is sooo complex :smiley:

1 Like

Hi,
Do you still have this issue ?
Out of curiosity, why “Also you have to change
SP='\x1b' to SP=’\n’” ?
What are your Qube settings (ram amount / pci passthou strict reset / pci passthrou devices / … ) ?
I am not yet using R4.1 on my desktop, so haven’t tested how it work on R4.1 yet

1 Like

I managed yesterday to GPU passtrough my Nvidia GTX 1650 Max-Q to my Manjaro HVM.

I see my GPU in Manjaro Hardware Settings and I can install nVidia Drivers.

When I restart my VM, it booted until Desktop and picture goes black, but VM does not crashes like every single time before.

I looked in dmesg and I noticed one error at the end:

nvidia: module verification failed: signature and/or required key missing - tainted kernel

Now my question is:

How to disable any possibility for a VM to know that it is a VM ?

I want that my Manjaro VM sees:

  1. A real CPU, not Xen-cpu
  2. A real GPU, not virtualised Hardware piece

In dmesg I also saw that my Manjaro VM sees my GPU on slot 00:07.0
My GPU is in slot 01:00.0

I thing now my problem is that my VM is detecting itself, and nvidia is not loading because of it.

edit:

dmesg | grep nvidia

nvidia: loading out-of-tree module taints kernel.
nvidia: module licence nvidia taints kernel.
nvidia: module verification failed: signature and/or required key missing - tainted kernel
nvidia-nvlink: nvlink core is being initilized, major device number 238
nvidia 0000:00:07.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
nvidia-modset: Loading nvidia kernel mods setting deriver for unix platforms 460.56 tue fed 23 23:230…
[drm] [nvidia-drm] [GPU ID 0x00000007] Loading driver
[drm] initillized nvidia-drm 0.0.0 20160202 for 0000:00:07.0 on minor 0

Are you using the latest Nvidia driver? I was able to passthrough my 1070 without issue besides when the higher ram is assigned. As of right now there is not a way to hide that the GPU is being passthroughed. However that should be an issue with the latest nvidia drivers now.

I’ve had no luck with patching stubdom in R4.1. The files which resemble the file in the guides are:

/usr/lib64/xen/boot/qemu-stubdom-linux-rootfs
/usr/libexec/xen/boot/qemu-stubdom-linux-rootfs

Performing instructions from the Github issue, also listed on neowutran.ovh, makes HVM’s fail to start after 60 secs.
Logs don’t give any hints here.

AMD Vega is still being passthroughed successfully, although I am running standard Qubes kernel, not VM-installed distro kernel, and windows open with the window manager on my other GPU-connected screen like other PVH VM’s.

I guess then I need to use the VM kernel and make it boot with Grub? Then connect it via the GPU to another monitor.

I have a successfull GPU passthrough on R4.1.
I followed my doc but needed to modify the stubdom-linux-rootfs patch.

2 Likes

Cool!
Did you overwrite this file?
/usr/libexec/xen/boot/qemu-stubdom-linux-rootfs

Yes.
Tried to document the steps here: Qubes OS article and here Update Windows Gaming HVM documentation by neowutran · Pull Request #139 · Qubes-Community/Contents · GitHub

2 Likes

Hi, I read your articles and managed to finally get everything working. Except I cannot get into a 3D world.
I can boot up games and get to the menus, but when trying to join a match or similar, it just gives an infinite loading screen.
Could it have something to do with IOMMU groups? I have almost the same setup as @Rnd3sB3g13rng, with Vega and RXX80.
I couldn’t find any IOMMU groups with my Fedora live USB with the kernel parameters. Maybe with another Linux distro?
I’m not sure if they are in the same IOMMU group because of this.

Could this be the problem here? Everything else works fine, and I can play 2D games using OpenGL with it.

CS:GO throws an error relating to DirectX. With Windows 10 Ameliorated DirectX12 is installed by default.

2 Likes

Solved… by installing Visual C++ redistributables.
Somehow they don’t provide those when you install most games these days.

But there is another issue I have. When “Safely ejecting the Vega GPU” it gives an error instantly after restarting the Windows HVM.
I have to reboot my Qubes after this. I just restart normally now, but after a few restarts the GPU suddenly goes into 100% fan speed, and I have to reboot Qubes. Not a giant issue, though.

2 Likes

There are some very noticeable lags when gaming with my Windows 10 HVM. I guess this is due to me just using a single LVM thin pool, which limits all my drives to the slowest one? These are all SATA/PCIe SSDs.
I haven’t heard anything about PCIe SSDs being mandatory for QEMU/KVM in other distros, but perhaps for Qubes? On your guide you mention M.2 disks, by which you of course mean a M.2 PCIe disk.

I can just tell that on my side, I only have 1 pool, containing 2 extremely fast M.2 PCIe disk. Performance are near native, I don’t have any visible lag that I didn’t had before using Qubes. Maybe try to check the gpu driver you are using, but yes, from my test, windows hvm require a very fast disk to not have lags. linux hvm are less an issue.

1 Like

How do you use USB passthrough? After a lot of troubleshooting I’ve pinpointed the RAM limit patch to not work with stubdom-qrexec feature. From package xen-hvm-stubdom-linux-full. It’s detailed in this post.
Just Qubes Windows Tools won’t let me USB passthrough.

Edit: Thanks to @jevank I learned I need to patch qemu-stubdom-linux-full-rootfs after installing xen-hvm-stubdom-linux-full. Working now.

Yes, I had a working setup but I ran into a new problem now when I switched my “secondary GPU”, or gaming GPU to my x16 slot, and my dom0 GPU to my x16 (x8) slot, but now I’m having difficulty hiding the gaming GPU from dom0.
It worked then the gaming GPU was in the second x16 (x8) slot.

It will boot with the gaming GPU monitor off when at the disk decryption screen, so it looks like it’s working, but after decrypting, it now turns on the input from the gaming GPU card.

GRUB reads rd.qubes.hide_pci=03:00.0,03:00.1

And a side note, if you have 2 M.2 disks both running in an Ultra slot that’s a damn good build.
And SMT didn’t really provide a benefit in my testing so far.

1 Like

I followed your advice and got a hold of a PCIe disk with 2,6GBps read, 1,8GBps write.
Performance is still choppy while gaming for me, and there is intermittent sound lag.

If I stand still in a game, I can reach comfortably 130-40 FPS.
If i start shaking the mouse around it tanks as low as 20-30 FPS.

Using 3 vcpus seems to yield the best, although subpar, performance.

OC’d the GPU a lot, didn’t help much. I think the issue is more related to disk throughput.
And perhaps also worth mentioning is that my GPU is running in x4 mode.
Had problems hiding the x16(x8) slot from dom0.

Can you run some benchmark ? I am a bit curious to understand the difference of performance between my setup and your setup. To compare some benchmark score to try to understand what is the difference.
And how do you use your mouse ? PCI-passthough of one usb controller ?

Depending on your timezone, you could reach me on matrix “neowutran”, “neowutran.ems.host” domain.

I’ve followed the guides (thanks @neowutran for the detailed steps), but sadly the results were bad. First here’s my setup:

AMD Ryzen 9 3900X
ROG Strix x570-E Gaming
64GB ram
2tb m2 nvme ssd

dom0 gpu: NVIDIA 1060
passthrough gpu: NVIDIA 1080 ti

passthrough works in the sense that I could add it to a windows vm and get it running without external issues, but the VM is barely usable at all, with a lot of VIDEO_TDR_FAILURE BSoD errors.

worth mentioning: I was able to use usb passthrough without issues, and I got audio working from the gpu audio device on the monitor speakers