Two dom0 updates broke my nvidia driver twice

Hello, within a week, two Dom0 updates (I think the kernel and Xen were upgraded) have each broken my NVIDIA driver, both ‘nouveau’ and the official NVIDIA open-source driver (570.169). After the first update, I could still switch to the official driver, but after the second one, this doesn’t work either. The error message is attached. Do you have any ideas on how to fix this so that either ‘nouveau’ or the official driver works again? My first idea would be to downgrade the kernel, but I don’t want to take the risk of breaking even more if I copy commands from my research, so thanks if you can help! Additional information: I use the kernel-latest and an NVIDIA card from the 40-series Geforce.

1 Like

Do you have any specific use-case for using your dGPU in dom0 instead of the iGPU? I simply hide the dGPU from dom0 using the first steps of this guide and live happily not having to worry about it ever breaking.

Yes, just using my iGPU is unfortunately not an option.

This is perfectly normal, you need to recompile / reinstall the driver every time the kernel change. It will break every single time the kernel has a new version.

As for your error, maybe a -devel package is missing for the specific new kernel?

1 Like

This is perfectly normal, you need to recompile / reinstall the driver every time the kernel change. It will break every single time the kernel has a new version.

Has never happened to nouveau before. Also, reinstalling xorg-x11-drv-nouveau didn’t helped.

As for your error, maybe a -devel package is missing for the specific new kernel?

Both kernel-devel and kernel-latest-devel are installed. Also I can’t make sense of why it [installing the nvidia driver] worked a few days ago and not anymore…

I can confirm that last update did break my UI too on Nvidia RTX GPU.

Hmm, maybe those drivers are not compatible with latest kernel version, this would not be a first :frowning: I’m not surprised for the proprietary driver, it would be more surprising that nouveau does not work too, but you may had blacklisted it to prevent from loading due to the proprietary driver?

Hmm, maybe those drivers are not compatible with latest kernel version

I will try it again in a few days/weeks then.

but you may had blacklisted it to prevent from loading due to the proprietary driver

Sounds reasonable! How can I undo this? I haven’t changed any config files on my own and reinstalling nouveau doesn’t seem to be enough. With /usr/bin/nvidia-uninstall (official driver) I think I have also selected the option to restore old states again.

There are two ways:

  • add something like rd.blacklist=nouveau (from memory, it might be a bit different) in boot command line, you can search for “nouveau” in grub configuration
  • add something in /etc/modprobe.d/ like “blacklist nouveau”

Not sure if this is the same problem that I just experienced, but maybe you want to give this a try …

To me it looks like there might be a bug in version 6.15.4 of the qubes kernel-latest-devel package:

The 6.15.4 version installs kernel.h in
./usr/src/kernels/6.15.4-1.qubes.fc37.x86_64/include/include/linux/kernel.h
(note the double “include” in the path)

In 6.14.4 it is in
./usr/src/kernels/6.14.4-1.qubes.fc37.x86_64/include/linux/kernel.h
(with only a single include)

The latter location is where the build process from your screenshot expects the file, and in 6.15 can no longer find it and fails.

Actually, most of the include files in 6.15 are underneath ./usr/src/kernels/6.15.4-1.qubes.fc37.x86_64/include/include (“double-include”)

This does not really make sense to me, and looks like it might be a result of some path-confusion in the package builder. I have not yet looked at the kernel repo for the root cause, though …

For comparison: RPM for 6.15 vs RPM for 6.14 - you can check the contents with rpm2cpio …rpm | cpio -t

… anyway …, what worked for me: I manually moved all the contents of the double-include dir one level up (subdirectory drm needs careful merging!)
After that the kernel module build worked as before.

2 Likes

Great tip!

Still I managed to fail:

sudo cp -r /usr/src/kernels/6.15.4-1.qubes.fc37.x86_64/include/include/linux/ /usr/src/kernels/6.15.4-1.qubes.fc37.x86_64/include/

Found another forum post related to this: Nvidia proprietary driver installation.

I’d prefer making nouveau work instead of disabling it. - at least til I am able to install the official driver.

That command just copies the linux directory into the right path.
Now the build log complains about further missing include files in other directories.

At least on my system there are still lots of other directories in include/include, so you should copy them all.

I recommend using the -i option to let cp ask you before overwriting something …

That would have been easy to avoid. I am sorry for my stupidity. Thank you! Looks like it works! I let you know after restarting.

1 Like