Available Updates Seem to Cause OS and Qubes to Crash

For the past few months, I’ve been dealing with two major issues with Qubes OS v4.1 on my Librem 14v1:

  1. The OS crashes/freezes, forcing me to do a force-shutdown. This seems to be highly correlated with new updates being available, particularly with Whonix (whonix-gw-16 or whonix-ws-16).
  2. Random qubes will randomly and abruptly shut down. I’ll just be sitting here working and I’ll see notifications like “Qube sys-net has shut down.” sys-net seems to be the biggest offender, but I’ve had sys-firewall, sys-whonix, and my debian-11 AppVMs do this as well. This also seems to be correlated with new updates being available, as well as if I had installed updates, but not rebooted the system. In the latter case, I used to simply restart affected qubes after installing updates, rather than rebooting, but this seems to almost always cause trouble (e.g. system crash after a few hours or so), so now I always do a reboot after installing any Whonix, Fedora, or dom0 updates.

I experience one or both of these about every 1-2 days. After encountering this many times, I am relatively confident that these are triggered by new updates being available. In about half of the cases, the system crashes before I’m notified of new updates. However, when I force-shutdown and reboot, then it shows me there are updates available. In the other half of cases, when I’m notified of updates being available, it certainly seems like it’s only a matter of time before the OS crashes if I don’t install them immediately, particularly if I leave the system idle for a while. The combination of updates being available and leaving the system idle has close to a 100% chance of the OS crashing/freezing eventually, noting that this shouldn’t be related to the screen lock/idle timeout because I currently have this disabled to rule this out as a culprit.

Note that I get my system updates through Whonix, which may be related to this issue. Has anyone else experienced this?

My machine is Thinkpad X220, but I can relate to that.

I, too, get this, sometimes, when my QubesOS is left idling all through the night. In the morning, the i3lock becomes simply unresponsive, and I cannot wake the screen up. I have to do a forced shutdown (by pressing down the physical power button).

After the current hard shutdown (I just had to do that again, as my machine was unresponsive after I left it powered on and idling all night), I did a total system update, and I just saw that debian-11 actually had an update available to itself and got it installed now. So, maybe, in my case, the debian-11 update got my machine unresponsive when it was idling during the night.

Interesting observation. This has been my experience (sometimes) as well.

I have i3lock screen lock enabled and I am suffering from this (not being able to wake up after a night of idling).

Another case difference between us: I get my updates through clearnet connections and not using Whonix.

This is strange; I have not experienced anything like that on my Librem 15. Qubes 4.1 looks very stable for me. Perhaps you have too few RAM for your use case?

I have 64GB. Sometimes I wonder a lot of the issues I’ve been having are related to the CPU in the Librem 14v1, but not sure.

I experience similar crashes too (I’m on Qubes-os 4.1 on Thinkpad T14 Ryzen (1st generation) with 32 GB ram), and my laptop crashes daily after recent updates that I did ~ 1-2 weeks ago (updated dom0 and all templates through terminal). The laptop keyboard and tauchpad stop responding and after a few seconds the screen turns black and only hard restart will allow me to go. Sometimes just crashes with the black screen.
It’s getting annoying when your own laptop crashes a few times per day and you have to restart and open back all your qubes and files or you may even loose your data.
Is this a known issue?

I don’t know exactly if it’s related to updates, but I did not have so often crashes before, with no other changes to the system besides the updates. Any suggestion how to debug this?
Thanks

At least it’s not just me. Normally when I report an issue, the general consensus seems to be that either I’ve done something wrong or I don’t have enough RAM (I have 64 GB). This is despite the fact that I’ve made 0 modifications to dom0, I install all updates as they become available, and I’m using a laptop that’s supposedly 100% compatible (Librem 14 v1).

Do you install updates through whonix? This is one of the optional settings during OS installation. I chose this because, why not? However, this may be a contributing factor to some of the stability issues I’ve been having. I have no idea how to identify what’s going on. From my observations, this seems to be related to the update system or networking. Most of the time when my OS crashes (mine just freezes, not black screen), when I reboot, I see that whonix-gw-16 updates are available. After seeing that many times, I don’t think that’s a coincidence.

Honestly, I encounter so many issues that I don’t even bother to report them anymore. Over the past few months, I don’t think a single day has gone by without incident. Here is what I’ve experienced just today, which is pretty typical:

  1. I installed updates for fedora-36, debian-11, and whonix-gw-16
  2. I’ve learned from experience that if I don’t reboot after installing fedora and whonix updates, I usually run into issues, so I tried a reboot
  3. The system got into this state I’ve never seen before. The screen went black, but it was still on. I tried to force shut-down, but even that wouldn’t work. When I pressed and held the power button, the screen would turn on (all black) for ~2 seconds and shut off. I thought I was going to have to disassemble the laptop and unplug the battery, but unplugging the power cord got it to shut down, even though it had a full charge.
  4. After booting back up, over the next few hours, some of my debian-11 qubes shut down while I was working for no apparent reason
  5. My sys-net and sys-firewall qubes didn’t shut down, but all of my networking stopped working (internal and external) for no apparent reason. Rebooting sys-whonix, sys-firewall, and sys-net fixed this

The state of this is quite sad. When I first tried qubes, I struggled to get it somewhat stable for about two months. After these two months, the system was pretty stable for about two weeks. Then, I started running into ~daily system crashes and qubes randomly shutting down. This has been going on for months now.

I’ve been using computers since the early 90’s and I’ve never encountered anything so unstable. The amount of times I need to force-shutdown my system in one week with Qubes dwarves all similar instances of all other operating systems I’ve been using over these past ~3 decades.

I have problems with my 32gb thinkpad x13 gen 1 that sounds what you have and may be related to this.

Tuxend Pulse 15 Crashing

What I did as a temporary solution is revert to an older kernel.

sudo qubes-dom0-update --action=install kernel-510

eddit: the 510 kernel

Then patching grub to allow me to select it at boot up

sudo grub2-mkconfig -o /boot/grub2/grub.cfg

Haven’t had any crashes in two days now except once when I forgot to select the older kernel.

For me is more simple probably, as I don’t use whonix for updates, just follow this How to update | Qubes OS and run the commands either in dom0 terminal or qube terminal.
My laptop was relatively stable (besides the annoying part that 1/3 of times when laptop starts with the “Unable to reset PCI device …” error so I have to restart and usually that solved the issue), and very rare crashes as a whole system. After my last updates, my laptop is crashing every day at least once, today it crashed 3 times already, and I’m almost at the limit to switch to other linux OS.
I really like Qubes-os and the idea of keeping different virtual machines (qubes) for different tasks, but the learning curve is quite steep and all the crashes, besides other small issues is making quite bad user experience. I’m tolerating some issues as I never know if my laptop will boot successfully or I will need to reboot it (some time it requires 2 reboots) because I like the Qubes-os, but each problem and each crash is testing my trust in Qubes.
I don’t know if all my issues are because of hardware incompatibility or other problems with my particular Thinkpad T14, but I have used it for some time with Ubuntu and everything was working fine (audio, video, touch pad, left and right click buttons, etc).

I have tried to set the kernel version in Qubes Tools >> Qubes Global Settings and set the older kernel to be used by qubes 5.10.112-1, but that probably is not changes the kernel for dom0. This didn’t stopped my crashes, so I have reverted back to 5.15.52-1.
Also I have run again the update commands for dom0, templates and standalones.

Does the 510 means kernel 5.10?
I have 2 versions of kernels in Qubes Global Settings:
5.10.90-1 and 5.10.112-1.

For now I have downgraded linux-firmware per https://github.com/QubesOS/qubes-issues/issues/7648#issuecomment-1206260794

[update]
After downgrad the linux-firmware my laptop didn’t crashed in last 24 hrs

After 4 days no crashes so looks like linux-firmware downgrade helped in my case.