Nouvaeu GPU Lockup & Driver Installation

I’m running a Nvidia GTX 1650 Max-Q, I submitted a HCL report here Thinkpad X1 Extreme Gen 2.

About 1-2 times a day, my desktop will crash or freeze up. Sometimes I’ll get lucky and it’ll boot me back into login. But most times, I need to do a hard reboot.

The error message I get is this:

GPU lockup - switching to software fbcon

I’ve looked around everywhere and can’t find a solution for it. So now I’m looking to install Nvidia proprietary drivers. I know the security risks involved, but I can’t have my laptop crashing on me 1-2 times a day.

The community guide is about 2 years old and recommended Fedora 18. I fear that the guide itself is outdated or no longer supported.

Is there an updated guide or does the guide up there still work?

Update:
I just got additional logs from /var/log/Xorg.0.log.old and dmesg

dmesg:

[16749.809301] nouveau 0000:01:00.0: fifo: fault 01 [VIRT_WRITE] at 0000000016602000 engine 40 [gr] client 13 [GPC1/PROP_0] reason 00 [PDE] on channel 2 [00ff58c000 Xorg[6577]]
[16749.809309] nouveau 0000:01:00.0: fifo: channel 2: killed
[16749.809311] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
[16749.809315] nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
[16749.809322] nouveau 0000:01:00.0: Xorg[6577]: channel 2 killed!

Xorg:

[ 16749.835] (EE) 
[ 16749.835] (EE) Backtrace:
[ 16749.836] (EE) 0: /usr/bin/X (OsLookupColor+0x139) [0x60b6f764d3e9]
[ 16749.837] (EE) 1: /lib64/libpthread.so.0 (funlockfile+0x60) [0x7df47c78ba90]
[ 16749.839] (EE) 2: /lib64/libc.so.6 (gsignal+0x145) [0x7df47c5e77d5]
[ 16749.840] (EE) 3: /lib64/libc.so.6 (abort+0x127) [0x7df47c5d0895]
[ 16749.840] (EE) 4: /lib64/libc.so.6 (__assert_fail_base.cold+0xf) [0x7df47c5d0769]
[ 16749.841] (EE) 5: /lib64/libc.so.6 (__assert_fail+0x46) [0x7df47c5dfe86]
[ 16749.841] (EE) 6: /lib64/libdrm_nouveau.so.2 (nouveau_pushbuf_data+0x107) [0x7df475796847]
[ 16749.842] (EE) 7: /lib64/libdrm_nouveau.so.2 (nouveau_pushbuf_data+0x67) [0x7df4757967a7]
[ 16749.842] (EE) 8: /lib64/libdrm_nouveau.so.2 (nouveau_pushbuf_data+0x18f) [0x7df4757968cf]
[ 16749.842] (EE) 9: /lib64/libdrm_nouveau.so.2 (nouveau_pushbuf_data+0x5a7) [0x7df475796ce7]
[ 16749.842] (EE) 10: /lib64/libdrm_nouveau.so.2 (nouveau_pushbuf_space+0x359) [0x7df4757977c9]
[ 16749.844] (EE) 11: /usr/lib64/dri/nouveau_dri.so (nouveau_drm_screen_create+0x207c7) [0x7df47ae53b27]
[ 16749.845] (EE) 12: /usr/lib64/dri/nouveau_dri.so (nouveau_drm_screen_create+0xcc4d3) [0x7df47aeff833]
[ 16749.845] (EE) 13: /usr/lib64/dri/nouveau_dri.so (nouveau_drm_screen_create+0xcf0c6) [0x7df47af02426]
[ 16749.845] (EE) 14: /usr/lib64/dri/nouveau_dri.so (nouveau_drm_screen_create+0xcf1fb) [0x7df47af0255b]
[ 16749.845] (EE) 15: /usr/lib64/dri/nouveau_dri.so (nouveau_drm_screen_create+0xd08a6) [0x7df47af03c06]
[ 16749.846] (EE) 16: /usr/lib64/dri/nouveau_dri.so (__driDriverGetExtensions_virtio_gpu+0x23798) [0x7df47a693468]
[ 16749.846] (EE) 17: /usr/lib64/dri/nouveau_dri.so (__driDriverGetExtensions_virtio_gpu+0x268c59) [0x7df47a8d8929]
[ 16749.847] (EE) 18: /usr/lib64/dri/nouveau_dri.so (__driDriverGetExtensions_virtio_gpu+0x268fac) [0x7df47a8d8c7c]
[ 16749.847] (EE) 19: /usr/lib64/xorg/modules/libglamoregl.so (glamor_finish+0x920) [0x7df47bd28c00]
[ 16749.847] (EE) 20: /usr/lib64/xorg/modules/libglamoregl.so (glamor_finish+0xe17) [0x7df47bd290f7]
[ 16749.848] (EE) 21: /usr/lib64/xorg/modules/libglamoregl.so (glamor_create_gc+0x9692) [0x7df47bd33ac2]
[ 16749.848] (EE) 22: /usr/lib64/xorg/modules/libglamoregl.so (glamor_create_gc+0x9bae) [0x7df47bd33fde]
[ 16749.848] (EE) 23: /usr/bin/X (DamageRegionAppend+0x6bf) [0x60b6f75cc1cf]
[ 16749.848] (EE) 24: /usr/bin/X (AddTraps+0x46d3) [0x60b6f75c0e43]
[ 16749.849] (EE) 25: /usr/bin/X (SendErrorToClient+0x35b) [0x60b6f74e7a2b]
[ 16749.849] (EE) 26: /usr/bin/X (InitFonts+0x3b4) [0x60b6f74ebb04]
[ 16749.850] (EE) 27: /lib64/libc.so.6 (__libc_start_main+0xf2) [0x7df47c5d2082]
[ 16749.850] (EE) 28: /usr/bin/X (_start+0x2e) [0x60b6f74d4e6e]
[ 16749.850] (EE) 
[ 16749.850] (EE) 
Fatal server error:
[ 16749.850] (EE) Caught signal 6 (Aborted). Server aborting
[ 16749.850] (EE) 
[ 16749.850] (EE) 
Please consult the Fedora Project support 
	 at http://wiki.x.org
 for help. 
[ 16749.850] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[ 16749.850] (EE) 
[ 16749.850] (II) AIGLX: Suspending AIGLX clients for VT switch
[ 16749.891] (EE) Server terminated with error (1). Closing log file.

I couldn’t get the rpmfusion installation to work, but the guide for compiling the driver manually seems to work.

This guide might be easier to follow, it’s more step by step, remember to update xorg.conf that step seems to be missing.

Both guides suggest that once you have compiled nvidia.ko you can just update xorg.conf and use the driver. I don’t know if that part is outdated, but that didn’t work for me, that driver is missing all sorts of dependencies.

After rebooting, the xserver fails to start and the logs says it’s because it’s missing the files from the driver bundle. You can execute the nvidia install package as you would normally do, which will install all the dependencies the driver needs. It has a ncurses installer for running it in text mode.

After running the nvidia installer and restarting the system, the xserver seems to load, but for me the nvidia driver failed with an out of memory error when loading surface cache.

I don’t know if the driver fails because you shouldn’t run the installer or because it doesn’t work with my card (nvidia 1060), but the error I got seems to be related to using that card with xen.

It didn’t do any “permanent damage”, removing the blacklist allows you to switch back to the nouveau driver even if you have run the nvidia installer.

1 Like

Hello @syspacket , after reading your thread I suspect we may be having similar issues. Instead of installing the nvidia drivers, I’m trying to go the opposite direction by attempting to disable the gpu entirely:

Please, keep us updated if you find a solution that works.

1 Like

Thank you for the link. Have you tried it on the latest kernel? I’m currently on 5.16.13-2

I wish disabling the discrete GPU was an option. But my HDMI output is connected directly to it. I’ll let you know if I find a solution!

I did try a newer version of the kernel, I can’t remember which version it was, I also tried 5 older versions of the nvidia driver. They all failed, I got different error messages depending on the version of the driver, but they were all an out of memory error.

I think the issue could be that nvidia tried to remove all support for using the consumer grade cards with hypervisors, but I don’t know if this is true. I found some posts where people claimed downgrading to an old version of the driver made the card work with hypervisors, but that driver was so old it couldn’t be compiled with the current kernel headers.

From what I read people with professional cards like the quadro cards don’t have the same issues.

I’m actually stuck on step 2, where it says dnf install kernel-latest-devel-5.9.14-1.qubes.x86_64.rpm. I’m trying to install it for the kernel I have, but can’t seem to find the package.

I think you need to use qubes-dom0-update kernel-latest-devel-5.9.14-1

kernel-latest-devel-5.9.14-1.qubes.x86_64.rpm

It doesn’t work, you can download the rpm and copy it to dom0 and install it manually.

Oh would that work for the kernel I have?

If I do uname -r in Dom0 I get 5.16.13-2.fc32.qubes.x86_64

It should work, there is a rpm for that version, I just thought you were trying to install the version you posted.

https://ftp.qubes-os.org/repo/yum/r4.1/current/dom0/fc32/rpm/kernel-latest-devel-5.16.13-2.fc32.qubes.x86_64.rpm

1 Like

Update: I went into the bios and disabled hybrid graphics, switching over to discrete graphics only. It’s been over 24 hours and I’ve had no crashes so far. This seems to have made my system stable.

In addition to this, I added nouveau.noaccel=1 in GRUB_CMDLINE_LINUX in /etc/sysconfig/grub. This is suggested to avoid lockups in Nouveau - ArchWiki

Hopefully this helps make your system stable as well :slight_smile:

2 Likes