Sudden Boot Failure due to seemingly innocuous config blunder

Hello forum and may I first thank the Creator/s for establishing this non-Google support service which I have been wishing for since January 2018.

That was when I installed Qubes OS 3.2 on a 1TB external USB HDD and after a few early revisions it has run extraordinarily nicely ever since … until today, Sunday 20-09-2020.

Excuse me for explaining the almost 100% certain catalyst for this inaugural system failure because even though it seems so unlikely to have caused this ‘read-only file system’ problem on boot, this was the one and only change I have made in over two(2) years, and my system disaster occurred immediately after I foolishly did so.

I wanted to hook up a Zyxel external USB modem dongal to try to increase my Wi-Fi reception better than the built-in Toshiba Satellite laptop’s Wi-Fi modem. After plugging in the USB modem to a USB slot on a 3-way multi-USB adaptor, there was seemingly no recognition by the network devices list, and so I thought it wise to try to tweak the sys-net VM somehow.

I selected the two available Intel USB ‘thingos’ listed and soon discovered that my external KBD and mouse (also connected via the 3-way multi-USB adaptor) lost communication. Oops! I probably should have cleared all the junk off the laptop KBD and touchpad and shutdown properly, but I have often had times when the external USB HDD has been bumped in its socket and had to hold down the power button for the usual ten seconds (10s) an so that is what I did this time too,

Now all hell has broken loose on my beloved Qubes OS system and I can’t boot past the ‘Disk Password’ routine. There are two years of carefully arranged bookmarks on sixteen(16) different VMs at stake, but at least all the data is currently safe and accesible through this Linux Mint system which is the laptop’s default OS when the external USB HDD is disconnected at boot.

Here is the list of error messages which I have written out on pen and paper during the boot process, and then retyped here. Please excuse any errors in my reverse-engineering of the errors.

system[1]:crond.service.failed to load environment files: input/output error

Buffer I/O error on dev dom 0, logical block 1, lost async page write
Buffer I/O error on dev dom 0, logical block 1036, lost async page write
Buffer I/O error on dev dom 0, logical block 1038, lost async page write
Buffer I/O error on dev dom 0, logical block 15728641, lost async page write
Buffer I/O error on dev dom 0, logical block 15728747, lost async page write
Buffer I/O error on dev dom 0, logical block 20971536, lost async page write
Buffer I/O error on dev dom 0, logical block 20971554, lost async page write
Buffer I/O error on dev dom 0, logical block 20971558, lost async page write
Buffer I/O error on dev dom 0, logical block 20971561, lost async page write
Buffer I/O error on dev dom 0, logical block 20971564, lost async page write

Buffer I/O error on device dm-0, logical block 391505
                         ...
  "     "    "    "    "    "      "       "   391513

Aborting journal on device dm-0-0.

JBD2: Error-5 detected when  updating journal superblock for dm-0-0

EXT4-fs error (device dm-0): ext4_journal_check_start:61: detected aborted journal

EXT4-fs (dm-0): Remounting filesystem read-only

EXT4-fs (dm-0): previous I/O error to superblock detected

systemd_journald [681]: Failed to truncate file to its own size: Read-only file system  
  "        "       "      "     "   "       **x 10**  "   "  "   "     "    "    "   "

systemd[1]: systemd_journald.service: Watchdog timeout (limit 3min)!

It appears to me that some sort of flag has left unlowered during my clumsy system crash shutdown but I hope that there might be others here at this forum who can gather from that list above what I might have done wrong and perhaps provide me with some miraculous quick and easy fix to solve my own self-made disaster.

I would rather try to reinstall Qubes OS 3.2 in some way that allows me to retain all the existing VMs and browser bookmark data than have to run a fresh install and then have to reconfigure all the custom VMs and then start restoring the bookmarks list from memory.

Please help if you are able and I apologise for such a long explanation. This is the least information I consider relevant besides the point that I am awaiting my I7 laptop’s depreciation period before buyng a replacement with the newer CPU so I can use version 4.

Thank you for reading and thanks again for this support forum.

Hello forum and may I first thank the Creator/s for establishing this non-Google support service which I have been wishing for since January 2018.
Hi,
That was when I installed Qubes OS 3.2 on a 1TB external USB HDD and after a few early revisions it has run extraordinarily nicely ever since … until today, Sunday 20-09-2020.
From this, and the rest of your post, it seems you are still using 3.2.
You will likely struggle getting support for this as it has not been
maintained for a few years now. I’d recommend moving to 4.0 once this
problem is sorted.
Now all hell has broken loose on my beloved Qubes OS system and I can’t boot past the ‘Disk Password’ routine. There are two years of carefully arranged bookmarks on sixteen(16) different VMs at stake, but at least all the data is currently safe and accesible through this Linux Mint system which is the laptop’s default OS when the external USB HDD is disconnected at boot.

This is a good sign. Have a look at the documentation for
mounting/decrypting a Qubes disk without Qubes:

Considering the errors you were getting, I would recommend using -o ro
in all mount commands that you use, just in case your drive is dying.

This is for 4.0, but the difference is that you can ignore the LVM
content. Your VM files should be in /var/lib/qubes/.... As for
mounting them, the command losetup --partscan /dev/loop0 <file> should
provide /dev/loop0px which you can then mount and pull files off.

I would rather try to reinstall Qubes OS 3.2 in some way that allows me to retain all the existing VMs and browser bookmark data than have to run a fresh install and then have to reconfigure all the custom VMs and then start restoring the bookmarks list from memory.
Please either start over on 4.0 (with your files from above) or at least
attempt to import your old VMs into 4.0. 3.2 is not supported and has a
large number of missing security patches.
Please help if you are able and I apologise for such a long explanation. This is the least information I consider relevant besides the point that I am awaiting my I7 laptop’s depreciation period before buyng a replacement with the newer CPU so I can use version 4.
The long explanation was useful. Thanks for providing plenty of info to
debug with. I can’t recommend it as secure, but using 4.0 with PV VMs
for hardware devices is still going to be more stable and secure than 3.2.

Many thanks for your reply, Jarrah. I guess by your username that you and I might be somewhere within a couple of hours of the same timezone. I’m in Perth, Australia.

I have another external USB HDD with 4.0 installed way back when it first released, in fact it may even be a Beta. So long ago and I haven’t used it because I have always thought that 3.2 would be more secure with the CPU I have on this Toshiba Satellite. It is missing one of those two features to do with the machine code or MMC or something. Two years after I was last researching these things and my memory has frazzled so much.

There are two CPU features that 4.0 requires for full HVM(?) operation and my Intel I7 only has one of them. That is why I have stuck with 3.2 until time comes this coming Boxing Day to upgrade hardware after 4 years.

What you have written encourages me to get onto my 4.0 system and see if I can transfer the current 3.2 App VMs across and start trusting that I can still use 4.03 with my redundant CPU for the next few months until time comes to buy the new laptop.

I like to believe in miracles, but when all is said and done, I think that your solution might be more realistic than waiting around for some magician to provide me with the cure for my foolish effort earlier this afternoon.

I’ll sleep on it with good old Linux Mint tonight and check in tomorrow morning to see if there might be any quick fix, but then get on with the job of migrating to 4.0? or maybe even download the latest 4.03 and go from there.

It is definitely more secure to run 4.03 on a deprecated CPU without one of the security features it prefers than to run 3.2 on the same machine which has all the features 3.2 wants, yes? Okay then.

If there are no other options tomorrow morning more appropriate for lazy ex-geeks such as I, then I shall follow your advice.

I have had a squiz at the link to Mount and Decrypt Qubes Partition from Outside Qubes and it looks like it might be a useful additional tool for the job I’m going to have to take on tomorrow.

Thank you very much for such a comprehensive and quick reply. I hope now that I have been spurred to go back looking for Qubes support and found this place, that it might be somewhere to check in from now on once I have my system back in order again, so I look forward to future discussions. Thanks again.

Hello again everyone. I hope your own personal weeks are off to better starts than mine so far for this coming Equinoxical seasonal transformation, but it’s not that bad, really. I’m fine.

Currently in the process of torrenting 4.03 while I have access to a bit of free albeit patchy Wi-Fi, (which was the cause of yesterday’s disaster here, funnily enough, so I may as well get some sort of compo for all this mess my system has found itself in). There’re a few hours to go and it’s quite boring, so I thought I might pass some of the waiting time doing something textual so as to keep most of the free Wi-Fi bandwidth for the torrent download … and there has been a question on my mind all afternoon since I copied all seventeen(17) AppVMs (including Fedora23dvm which I have never used), onto the laptop data partition, What better time to ask it than right now?

The AppVMs total 364.9GB. These are V3.2 AppVMs. I’ll get around to 4.03 when I am good and ready but right now the main focus needs to be on restoring the 3.2 system which has worked fine for close to three(3) years [CORRECTION: 4 years]. That is what I would like to do now and for the next few months until the new laptop is bought at the Boxing Day sales somewhere.

It just seems a bit foolish to change tactics which have worked for over 90% of the game right near the end all because of one stupid and unusual mistake.

Okay Jarrah, now what do you think will happen if I reinstall 3.2 in exactly the same way as I did back in 2018 and then clear the /var/lib/qubes/appvms directory and copy across my current seventeen(17) previous AppVMs back into that directory?

The Qubes version is the same, in fact the .ISO is exactly the same one, and so I don’t see any natural sort of old-fashioned way that the new installation would be able to tell whether I had manually configured all the AppVMs, their menus, and the browser bookmarks, or whether I’d done it the clever way.

THE QUESTION: Will restoring the existing AppVMs back into a new installation likely work, or is there a clear and present reason why Qubes is designed to specifically deny such a tricky little move?

I’ll stick with 3.2 until after Christmas regardless of which option might be less insecure, because it is near the end of the innings for this laptop and that’s the way it is. You just think of Bradman and you’ll understand why good sense tells me not to mess with a working system right near the end when the alternative is not much different from the current arrangement.

Just let me know before I try it whether anyone else has switched AppVMs in the /var/lib/qubes/appvms directory before.

Probably the simplest solution to my question might be to go into town tomorrow and buy a new 1TB external HDD and install 3.2 fresh on that drive. Then I can test how the tricky insertion of all the AppVMs into the /var/lib/qubes/appvms directory works without risking the state of the current messed up system that led me here to begin with.

I’ll pick up another drive tomorrow if I get the time and report back on how it worked later in the week, hopefully.

I am still baffled as to how there is not some kind of quick and easy fix for such a strange and undeserved boot error. Something tells me that there is but I’d have to use the Google system to find the answer.

CONCLUSION: I gave up trying to work out a way to use the installation USB stick to solve the read-only file system errors at boot, went out and bought another external 1Gb USB HDD and took Jarrah’s advice in the end, installing 4.0.3 and manually rebuilding all the custom AppVMs and firewalls, and all the useful bookmarks I can remember, then copied all the data across from the old drive to the new. Only about twelve(12) hour’s work yesterday, and now everything seems almost back to normal.

4.0.3 has a few improvements over 3.2. Some of the quirky little harmless but annoying bugs have been ironed out now. It seems to me that it is now okay to detach a device from a VM which has not been unmounted within the VM yet, which used to cause all kinds of delays with 3.2.

As Jarrah warned me, 4.0.3 with the PVH configuration is seems to me from a layman’s perpective to be just as insecure as 3.2 with whatever that used to use to trick the tricksters. A little over my head, that kind of technology.

Only forseeable worry I have now is how things are going to work out after Christmas if all goes as planned and I find a cheap new ex-demo laptop with IOMMU/VT-D / AMD-Vi (preferably the Intel option), plug in my Qubes OS to a USB port and then just change all the VM settings from PVH to HVM and cross fingers for a smooth transition to the recommended hardware.

Maybe a quick new thread either here or even in the General section might be a better way to pose that question than appending all this now irrelevant lineage which will only waste everyone’s time. Yes, I think I’ll write up a new thread about it. See you there.