Prepare-suspend hangs when nvidia attached to appvm

On a thinkpad P with an NVIDIA card, I have created an HVM AppVM and attached the card for cuda and accelerated development. Which works well without any issues.

The issue is, when the card is attached, the suspend operation takes longer and eventually the system suspends itself, however the nvidia fans were still running after the suspend.

Later on, I have realized that the app vm might not have suspended at all, as there was a 100% CPU utilizing prepare-suspend process running in the vm.
(/usr/lib/qubes/prepare-suspend suspend was running for 10+ hours with 99% CPU)
I can’t kill the process with kill -9

Any ideas on how to fix the suspend issue?

I think found the answer, patching /usr/lib/qubes/prepare-susoend with the following resolves it for me.

--- prepare-suspend.orig	2025-09-03 13:58:38.341044331 +0300
+++ prepare-suspend	2025-09-03 13:53:30.515398514 +0300
@@ -63,6 +63,10 @@
         if [ "$vendor" = "0x8086" ] && [ "$class" = "0x030000" ]; then
             continue
         fi
+	# skip NVIDIA graphics device
+        if [ "$vendor" = "0x10de" ] && [ "$class" = "0x030000" ]; then
+            continue
+        fi
         if ! [ -e "$dev_path/driver" ]; then
             continue
         fi

2 Likes

I had the issue that, when attaching my Nvidia RTX 4090 to an appvm, the suspend also took long, with qubes.SuspendPre timeout messages in dom0 and when the system comes back, even though sys-usb found the devices, keyboard and mouse didn’t work. This fixed it. However, after the suspend the appvm has to be restarted, because the GPU does not work anymore:

Unable to determine the device handle for GPU0000:00:06.0: Unknown Error

Any idea how to fix that?

Unfortunately I don’t know how to fix it.

But just wondering, are you able to run nvidia driver in the appvm with r4.3? My template stopped functioning after installing nvidia driver.

I’m running R4.2.4. Created a debian template, attached the GPU to it, installed the nvidia driver and it worked right out of the box without issues (except for suspend of course). Only thing I had to do was to add to the grub commandline to ignore the GPU.

I’m currently working on the suspend issue, I think the problem is that the kernel module does not get unloaded because Xorg and nvidia-persistenced keep the GPU in use. nvidia-persistenced can be stopped, but Xorg grabs the GPU by default, even though it is outputting on the virtual device.

1 Like

I have used a different approach.
First, blacklisted nvidia in the template in /etc/modprobe.d

  • blacklist nouveau
  • blacklist nvidia_drm
  • install nvidia_drm /bin/true
  • blacklist nvidia_modeset
  • install nvidia_modeset /bin/true

also, permanently disabled nvidia-persistenced service in the template.
this way the module is not used by any services other than us.

Added nvidia to /rw/config/suspend-module-blacklist of the appvm.

Now I can suspend and resume (given that the nvidia driver is idle, i.e., we are not running a job on it, at the time of suspend)