On a thinkpad P with an NVIDIA card, I have created an HVM AppVM and attached the card for cuda and accelerated development. Which works well without any issues.
The issue is, when the card is attached, the suspend operation takes longer and eventually the system suspends itself, however the nvidia fans were still running after the suspend.
Later on, I have realized that the app vm might not have suspended at all, as there was a 100% CPU utilizing prepare-suspend process running in the vm.
(/usr/lib/qubes/prepare-suspend suspend was running for 10+ hours with 99% CPU)
I can’t kill the process with kill -9
I had the issue that, when attaching my Nvidia RTX 4090 to an appvm, the suspend also took long, with qubes.SuspendPre timeout messages in dom0 and when the system comes back, even though sys-usb found the devices, keyboard and mouse didn’t work. This fixed it. However, after the suspend the appvm has to be restarted, because the GPU does not work anymore:
Unable to determine the device handle for GPU0000:00:06.0: Unknown Error
I’m running R4.2.4. Created a debian template, attached the GPU to it, installed the nvidia driver and it worked right out of the box without issues (except for suspend of course). Only thing I had to do was to add to the grub commandline to ignore the GPU.
I’m currently working on the suspend issue, I think the problem is that the kernel module does not get unloaded because Xorg and nvidia-persistenced keep the GPU in use. nvidia-persistenced can be stopped, but Xorg grabs the GPU by default, even though it is outputting on the virtual device.