I had several qubes lock up to a point that they wouldn’t shut down today. It generated kernel messages but i’ll start at the beginning:
I installed qubes 4.1-rc2 but when I tried to install qubes-url-redirector then use it by initiating a “open in disposable qube”, the disposable qube would launch but would never open a web browser. Also the qube would never close on its own, but thats not what i mean by “crash”. It just creates a running qube that does nothing but consume 4 gigs every time I try it. (Note that I didn’t manually close the dead disposables since i’d have to figure out which disposables were in use (from before i started with qubes-url-redirector.)). Locked the screen for the night.
Anyway, came back the next day, logged in, and like 80% of my windows from various qubes were gone, including all non-disposable webbrowsers.
I checked and the system uptime for dom0 was 11 days, so not a power failure thing.
I checked the list of running qubes. All 33 qubes were still running, but most had no output windows to the screen anymore. Trying to launch a webbrowser, or a terminal against a running qube did not seem to do anything.
I started sending the shutdown signal to all qubes of the 33 running (except sys-* qubes).
Most shutdown, but a few did not. It seemed to be just those I had tried to launch a new webbrowser/terminal on that were hung.
Now, those that were hung, i looked at the console log on dom0 and got messages like:
blk_update_request: I/O error, dev xvda, sector 415744 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
EXT4-fs (xvda3): I/O error while writing superblock
Failed to write entry (22 items, 759 bytes) despite vacuuming, ignoring: Input/output error
This implies a problem with the dom0 disk, but I checked dmesg on dom0 and it did not have disk related errors
As further evidence of the dom0 disk being ok, the output of running:
sudo smartctl -a /dev/nvme0
in dom0 includes:
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged
Also note that it did not run out of system memory. It only consumed 110 Gigs (of 128 Gigs of ram).
If relevant, the “disk” is a M.2 module