I installed Qubes 4.1 a month or so ago; but have been experiencing pretty regular kernel hangs since then. Most recently; they happened during the middle of a dom0 update, and left my system in a partially corrupted state. I think they’re at least partially caused by some kind of kernel or filesystem issue when dom0 starts swapping.
After I first noticed several corrupted rpm packages in dom0 after a crash during an update, I decided to reinstall all rpms while running dmesg so I could observe any issues in more detail. After around 550-600 packages, the kernel watchdog detected a soft lockup. About a minute later, the system became unresponsive. (Mouse cursor still moved; but the UI was otherwise completely unresponsive and the system would not accept keyboard input.)
Pictures of dmesg output during the lockup:
After looking at the error and doing some research, I figured this might be caused by some kind of swap issue. So I restarted, turned off swap in dom0, and re-ran the reinstall process. This time the system didn’t hang; and as I observed system memory usage via top in a separate terminal, I noticed that swapping would have started at about the same time I experienced the soft lockup previously.
Happy to help debug things further if this isn’t an obvious or known issue.
My current workaround (‘disable swap in dom0’) isn’t really ideal, so if anyone has better suggestions for things to try, I’m all ears.