Strange Crash on November 4th

My Qubes OS 4.0.3 crashed quite strangely on the morning of November 4th @ GMT+8 in Australia. It was very weird because all seemed quite normal until I attempted to open a Youtube page in my ‘Web-News’ AppVM.

Yes, just another typical Wednesday at first:

Boot Qubes OS with sys-net, sys-firewall and Base; my offline utility AppVM all start by default.

Open the usual text file to check which vitamin tablets to take that morning. Open the usual ledger text file to remember how much money I had left. Both these just local text files in the Base AppVM.

Then open the Web-Net AppVM to tally the average daily data remaining on the ISP account. Then open the Web-Org AppVM to check the weather report. Then open the Web-Mail AppVM to check the email accounts. Then open the Web-News AppM to load Youtube to see how the US election had been going overnight.

Then that was it! The Youtube page never rendered. Then everything stopped, except the mouse movement. No clicking on anything in the different AppVM browser or text editor windows. No response from the top panel, and the bottom panel wouldn’t unhide when the mouse pointer was down at the bottom of the screen.

No way to shutdown either. Except to hold down the power button on the laptop until everything stopped. Then the whole system failed to boot, and it has been like that since that Wednesday morning. Six days, even! I’ve just been too dismayed to even think about it until today.

I have no idea how to run fsck on a LUKS partition in the basic CLI that I get after entering the password for the root partition, which is LUKS encrypted but there doesn’t seem to be any way to access cryptsetup luksOpen in that system, so that’s that. My new Qubes OS 4.0.3 system barely ran for a month before fatally crashing, and now I’m going to have to spend the rest of this week reinstalling a new system, and then building all the custom AppVMs and bookmarks again, (which I admit I was too busy and careless to bother backing up when I had the chance so I have no one to blame but myself…)

Except for the strange timing and the strange cause of this strange death of Qubes OS 4.0.3 last Wednesday morning. That it all happened at the time when I was attempting to access a certain news channel on Youtube on the morning after the US election, which was still late Tuesday night in Washington DC, being currently thirteen(13) hours behind my local time. Shenanigans or just bad luck?

Now we’ve all been told again and again over the last decade that correlation is not causation, but I could not help but wonder over these past six(6) days whether anyone else around here who perhaps runs Qubes OS 4.0.3, perhaps with Debian AppVMs might perhaps have happened to try to open any news channel on Youtube around that time last week and perhaps had any unhappy results such as I did.

If there was any other system that strangely crashed fatally at around the same time whilst attempting to do something similar related to news media channels on Youtube, then we would seem to have causation by deduction, wouldn’t we then?.

I had experienced a boot error in the past and I think it was connected to running out of space on some partition of Qubes. There are issues that you can look up (GitHub - QubesOS/qubes-issues: The Qubes OS Project issue tracker) and I think I read something about that some kind of warning is in the works to help avoiding these kind of problems in the future. (I don’t know for sure). The thing is, you are not alone in experiencing something like this, although it took me quite a long time and it never happened on a relatively fresh system. We all do different stuff on our computers and have different hardware so it is hard to diagnose.

If you get to the emergency shell (I guess, you do) you are being advised to take a look at the system logs by typing “journalctl”. There could also be some message to “run fsck manually”.
If this is the case you can do so by running:

fsck dev/mapper/qubes_dom0-root

Even if this does not help you with completely rescuing your system, it might help booting up as usual and at least give you the opportunity to backup important data or even find the culprit for your crash.

There are other ways of getting data out of a corrupted system by simply mounting your Qubes system in Debian or any Linux distribution. You are then prompted to decrypt your system and can rescue important files/settings etc.

In short, it might not be too late to try and save some files even if you’re already set on reinstalling the system.

2 Likes

Thank you very much, Raphael. I didn’t post this in the Support section because I didn’t think that there would have been any solutions. Once again I’m proven wrong, but that’s a good thing! Well, possibly.

I shall copy your fsck command out on paper and have a go at that shortly. I’m currently backing up my main data partition in Linux Mint on the laptop HDD before beginning a fresh install, so it’s not too late yet.

Also, it might be worth using this latest disaster to explore what I might be able to get from mounting the Qubes OS root partition in Linux and checking into /var/lib/qubes to see what I might be able to salvage. I believe Jarrah has also suggested something like this, so maybe it’s time I use this opportunity to improve my own skills.

I’ll report back after trying out your fsck command.

PS: running out of space could be a possibility, although there’re still 31Gb free on that 128Gb root partition. The allocations to each of the online ‘Web’ VMs are standard though, so that might put it all down to no more than coincidence and user paranoia that the News VM happened to exceed its quota on such an historical day as it did.

That can save you some time especially with custom apps, bookmarks and files. Backing up your qubes from time to time is something you should plan for the future.

Before doing so, just check if this is really advised. When I started my system I didn’t notice at first but looking closely there was the suggestion to “run fsck manually”.

I am not an expert so like you I investigate and look for solutions in the usual places. Mostly, someone has experienced something similar and there are already ways described to help yourself.

And yes, there usually is a logical explanation for things like this.

1 Like

Sorry to bother you again, Raphael… I am still waiting for another couple of hours for the files on the data partition to copy across to another drive, and so there’s more time to think than I usually allow for.

Two questions come to mind. Firstly, the easy one. I presume that the reason I might be able to start with:
"fsck /dev/mapper/qubes_dom0-root "
is that I’ve already decrypted the LUKS partition when I typed the password during the GUI boot process. It seems quite obvious, and that is why I do not need to start with the “cryptsetup luksOpen” part,as found in another search for answers about fsck and LUKS.

Secondly, I did read through the journalctl file a few times, but in my dunderheaded panic, never thought of any way to copy that file to a USB stick. Not that I could find any meaningful lines in that log that I thought might be of use to an expert in helping out, but maybe it is worth trying to get a copy and see if anything significant might be in there.

It has been quite a few years since I was all that cluey about Bash or CLI operations, but if I was to insert a USB stick into the second USB port, then it would probably be /dev/sdc, so I might be able to type:
“mount -t ext4 /dev/sdc1 /tmp”

or something based around that idea. Then if I was to type:
“cp journalctl /tmp”
that might succeed in copying that file to the USB stick so that I could access it back in Linux Mint and post the most relevant part of it here on the forum.

I shall try this later as soon as the file copies are complete. I just wondered if that might be on the right track to acquiring a copy of this journalctl file to provide here.

Sorry again if this is rather the sort of problem I should just go and try out myself when the file copies are done. It probably is, actually, but now is a time to ask whether there are better ways to get that journalctl log onto something I can refer to from a working system.

Maybe someone else might be able to help you with that but right off the bat I couldn’t say where a USB stick would be mounted and which commands would work there [emergency shell]. I tried and read and searched … like I always do in cases like these.

Fortunately, I only had this one experience with the emergency shell and I do remember this ‘rdsosreport.txt’ file being generated and that I should copy it to a stick or to /boot. I don’t remember anymore what this file contained but I decided to not save this file on a stick and try running this file system consistency check tool.

I had backups and I did experiment a lot with 4.1 a few months ago so I do expect things to fail now and then and I have more than 1 installs of Qubes - in fact more than 1 laptop. So, like you, I do investigate and try things out, ask for help etc. but sometimes I do move on without knowing what went wrong in detail because I’d have to know much more about Qubes specific processes that I know practically nothing about.

1 Like

Thanks again, Raphael. You’ve helped again. It is probably easier to fiond rdsosreport.txt somewhere in the broken system, and copying that to /boot alleviates the need for a USB stick because /boot is not encrypted.

I am enthused and motivated to check into this little disaster some more tonigjht before I wipe the partition and start afresh.

Thank You. I’ll get on with my digging now and report back tomorrow.

Half worked. Still no working Qubes yet, but I did manage to stop and think carefully for a change and retrieved a copy of the rdsosreport.txt file by mounting the /boot partition of the affected system and copying it there, then grabbed it in the Linux Mint session and looked it over backwards.

There is a line that reads:

  58.943106] dom0 systemd-fsck[624]: Root: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
[   58.943322] dom0 systemd-fsck[624]: 	(i.e., without -a or -p options)

down towards the bottom of the file. I’ve made a copy of the parts that seem possibly relevant named rdsosreport-tail.txt which is 9,912 characters and 78 lines. I am reluctant to post it here because even the tail is so big, but I suppose it is the only clue I have to work with, so I’ll post the tail file and hope that it’s not too ill-mannered.

 9.357343] dom0 systemd[1]: Found device My_Passport_2626 2.
[    9.364736] dom0 systemd[1]: Starting Cryptography Setup for luks-57807466-d0f8-4540-8b26-2f5b8b98f09b...
[    9.386825] dom0 systemd[1]: Found device My_Passport_2626 3.
[    9.388769] dom0 systemd[1]: Starting Cryptography Setup for luks-8d49674b-e4a7-4cbf-b6c7-5a6ac204cbdf...
[    9.391937] dom0 systemd[1]: Started Forward Password Requests to Plymouth.
[    9.392289] dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-ask-password-plymouth comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    9.404504] dom0 kernel: kauditd_printk_skb: 2 callbacks suppressed
[    9.404506] dom0 kernel: audit: type=1130 audit(1605054803.749:13): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-ask-password-plymouth comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    9.451200] dom0 systemd[1]: Received SIGRTMIN+20 from PID 332 (plymouthd).
[   21.993205] dom0 systemd-cryptsetup[430]: Set cipher aes, mode xts-plain64, key size 512 bits for device /dev/disk/by-uuid/57807466-d0f8-4540-8b26-2f5b8b98f09b.
[   21.997469] dom0 systemd-tty-ask-password-agent[432]: Invalid password file /run/systemd/ask-password/ask.BOPcM7
[   21.997757] dom0 systemd-tty-ask-password-agent[432]: Failed to show password: Bad message
[   22.004749] dom0 systemd-cryptsetup[431]: Set cipher aes, mode xts-plain64, key size 256 bits for device /dev/disk/by-uuid/8d49674b-e4a7-4cbf-b6c7-5a6ac204cbdf.
[   23.097734] dom0 systemd[1]: Started Cryptography Setup for luks-8d49674b-e4a7-4cbf-b6c7-5a6ac204cbdf.
[   23.097852] dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-cryptsetup@luks\x2d8d49674b\x2de4a7\x2d4cbf\x2db6c7\x2d5a6ac204cbdf comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   23.110036] dom0 kernel: audit: type=1130 audit(1605054817.455:14): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-cryptsetup@luks\x2d8d49674b\x2de4a7\x2d4cbf\x2db6c7\x2d5a6ac204cbdf comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   23.099658] dom0 systemd[1]: Found device /dev/mapper/luks-8d49674b-e4a7-4cbf-b6c7-5a6ac204cbdf.
[   23.099944] dom0 systemd[1]: Found device /dev/disk/by-uuid/512fbcc3-e8a5-4ef3-8819-6d734234823c.
[   23.100201] dom0 systemd[1]: Reached target Initrd Root Device.
[   24.008776] dom0 systemd[1]: Started Cryptography Setup for luks-57807466-d0f8-4540-8b26-2f5b8b98f09b.
[   24.009123] dom0 systemd[1]: Reached target Encrypted Volumes.
[   24.009405] dom0 systemd[1]: Reached target System Initialization.
[   24.009708] dom0 systemd[1]: Reached target Basic System.
[   24.009861] dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-cryptsetup@luks\x2d57807466\x2dd0f8\x2d4540\x2d8b26\x2d2f5b8b98f09b comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   24.022058] dom0 kernel: audit: type=1130 audit(1605054818.365:15): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-cryptsetup@luks\x2d57807466\x2dd0f8\x2d4540\x2d8b26\x2d2f5b8b98f09b comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   24.012123] dom0 systemd[1]: Found device /dev/mapper/luks-57807466-d0f8-4540-8b26-2f5b8b98f09b.
[   24.025764] dom0 systemd[1]: Started dracut initqueue hook.
[   24.026083] dom0 systemd[1]: Reached target Remote File Systems (Pre).
[   24.026346] dom0 systemd[1]: Reached target Remote File Systems.
[   24.026630] dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dracut-initqueue comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   24.038831] dom0 kernel: audit: type=1130 audit(1605054818.381:16): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dracut-initqueue comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   24.027785] dom0 systemd[1]: Starting File System Check on /dev/disk/by-uuid/512fbcc3-e8a5-4ef3-8819-6d734234823c...
[   24.086354] dom0 systemd-fsck[624]: Root contains a file system with errors, check forced.
[   33.508497] dom0 systemd-fsck[624]: Root: Inode 2884739 extent tree (at level 2) could be narrower.  IGNORED.
[   33.783219] dom0 systemd-fsck[624]: Root: Inode 2884754 extent tree (at level 2) could be narrower.  IGNORED.
[   33.939160] dom0 systemd-fsck[624]: Root: Inode 2884761 extent tree (at level 2) could be narrower.  IGNORED.
[   35.436484] dom0 systemd-fsck[624]: Root: Inode 2884785 extent tree (at level 2) could be narrower.  IGNORED.
[   36.675198] dom0 systemd-fsck[624]: Root: Inode 3145737 extent tree (at level 1) could be narrower.  IGNORED.
[   38.092966] dom0 systemd-fsck[624]: Root: Inode 3145767 extent tree (at level 2) could be narrower.  IGNORED.
[   40.111641] dom0 systemd-fsck[624]: Root: Inode 3277309 extent tree (at level 2) could be narrower.  IGNORED.
[   42.669639] dom0 systemd-fsck[624]: Root: Inode 3539041 extent tree (at level 2) could be narrower.  IGNORED.
[   43.999091] dom0 systemd-fsck[624]: Root: Inode 3539046 extent tree (at level 2) could be narrower.  IGNORED.
[   58.942773] dom0 systemd-fsck[624]: Root: Unattached inode 2754213
[   58.943106] dom0 systemd-fsck[624]: Root: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
[   58.943322] dom0 systemd-fsck[624]: 	(i.e., without -a or -p options)
[   59.566655] dom0 systemd-fsck[624]: fsck failed with error code 4.
[   59.566958] dom0 systemd-fsck[624]: Running request emergency.target/start/replace
[   59.570747] dom0 systemd[1]: systemd-fsck-root.service: Main process exited, code=exited, status=1/FAILURE
[   59.571595] dom0 systemd[1]: Failed to start File System Check on /dev/disk/by-uuid/512fbcc3-e8a5-4ef3-8819-6d734234823c.
[   59.572350] dom0 systemd[1]: Dependency failed for /sysroot.
[   59.573009] dom0 systemd[1]: Dependency failed for Initrd Root File System.
[   59.573657] dom0 systemd[1]: Dependency failed for Reload Configuration from the Real Root.
[   59.574266] dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-fsck-root comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[   59.574570] dom0 systemd[1]: initrd-parse-etc.service: Job initrd-parse-etc.service/start failed with result 'dependency'.
[   59.586468] dom0 kernel: audit: type=1130 audit(1605054853.931:17): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-fsck-root comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[   59.574884] dom0 systemd[1]: initrd-parse-etc.service: Triggering OnFailure= dependencies.
[   59.575151] dom0 systemd[1]: initrd-root-fs.target: Job initrd-root-fs.target/start failed with result 'dependency'.
[   59.575414] dom0 systemd[1]: initrd-root-fs.target: Triggering OnFailure= dependencies.
[   59.575716] dom0 systemd[1]: sysroot.mount: Job sysroot.mount/start failed with result 'dependency'.
[   59.575964] dom0 systemd[1]: systemd-fsck-root.service: Unit entered failed state.
[   59.576775] dom0 systemd[1]: systemd-fsck-root.service: Failed with result 'exit-code'.
[   59.577800] dom0 systemd[1]: Stopped dracut cmdline hook.
[   59.577887] dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dracut-cmdline comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   59.577952] dom0 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dracut-cmdline comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   59.578621] dom0 systemd[1]: Stopped dracut initqueue hook.
[   59.590059] dom0 kernel: audit: type=1130 audit(1605054853.935:18): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dracut-cmdline comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   59.590106] dom0 kernel: audit: type=1131 audit(1605054853.935:19): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dracut-cmdline comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   59.590919] dom0 kernel: audit: type=1131 audit(1605054853.936:20): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dracut-initqueue comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   59.578843] dom0 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dracut-initqueue comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   59.579102] dom0 systemd[1]: Stopped target Basic System.
[   59.579356] dom0 systemd[1]: Stopped target System Initialization.
[   59.586754] dom0 systemd[1]: Starting Emergency Shell...
[   59.587024] dom0 systemd[1]: Reached target Initrd File Systems.
[   59.610809] dom0 systemd[1]: Received SIGRTMIN+21 from PID 332 (plymouthd).
[   59.684313] dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=plymouth-start comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   59.684395] dom0 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=plymouth-start comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   59.696491] dom0 kernel: audit: type=1130 audit(1605054854.041:21): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=plymouth-start comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   59.696494] dom0 kernel: audit: type=1131 audit(1605054854.041:22): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=plymouth-start comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

If there is possibly anything in there that anyone might see a solution in, then please point it out, and you will save me a few days work rebuilding the system from scratch, and I will wish you Good Karma for your lovingkindness. Otherwise, I guess I won’t be sitting around watching Youtube news channels this coming weekend. Too busy setting up Qubes. Again. :roll_eyes: