After crash cannot start any VM qubes

scallyob · April 14, 2022, 9:05pm

seems to be same as reported here: Qubesd service not running on boot, cannot be started

boot is slow, but I can eventually login and open dom0 terminal

load is high, but no other qubes start

qvm-ls returns

File "/usr/lib/python3.8/site-packages/qubesadmin/app.py", line 727, in qubesd_call
   client_socket.connect(qubesadmin.config.QUBESD_SOCKET)
FileNotFoundError: [Errno 2] No such file or directory

cannot start Qube Manager

This happened at least once before, but a reboot fixed it. I’ve rebooted about 6 times so far and no luck this time. Crashing has been regular since install 4.1, and more regular since I reinstalled 4.1 on BTRFS to address backup issues. System load is very high on BTRFS and is high now even though no VM qubes are loading.

systemctl status qubesd

gives

 Loaded: loaded
 Active:deactivating

then

systemctl stop qubesd

changes to

Active: failed

then

 systemctl start qubesd

seems to just hang

scallyob · April 14, 2022, 10:05pm

if i just run
dom0$] qubesd

I get a bunch of output with a final error of

“Permission denied: ‘/var/lib/qubes/appvms/qube-name/private.img’”

for each qube.

Permissions on these .img files are: -rw------- 1 root qubes

inspired by this: Qubes daemon often doesn't start on boot · Issue #5295 · QubesOS/qubes-issues · GitHub

EDIT: Obviously the solution to the permissions error is to run

dom0$]sudo qubesd

then it goes through and “Reflinked” the private.img of each qube to longer named version of each with the date in it

Then it “Renames” a bunch of .imgs
Then it “Removes” a bunch of .imgs
Then it starts "Reflink"ing again

 sudo systemctl status qubesdb

returns

 qubesd.service:start operation timed out. Terminating.

scallyob · April 15, 2022, 1:42pm

After trying to manually backup the private.img and root.img for some important qubes, I tried rebooting one more time. This time some basic qubes booted. Seems running qubesd above fixed something. This allowed me to run Qubes Backup on the important qubes before reinstalling on XFS.

XFS on RAID1 looks successful(unlike EXT4, which doesn’t work on 4.1 installer) and load so far seems back to normal (unlike BTRFS, which constantly froze). Hopefully found my new file system.

scallyob · June 6, 2022, 3:20pm

XFS has proven to be more stable than BTRFS. But still get the ocassional crash. Then I get this same problem where no qubes start after boot/login. I open a terminal and run:

sudo qubesd

And it does it’s thing for a while. Then can start qubes, but no apps will launch - GUI issue? So I have to reboot again. After half a dozen reboots I usually get things to work.

As this has happened on multiple new installs of 4.1 I’m assuming there is some issue there.

This might be related: Crashing qubes virtual machines and "input/output error" - #4 by waliston

But not clear if they’re using 4.1 and their computer isn’t shutting down, just the qubes(VMs) are.

scallyob · July 3, 2022, 3:50am

These days a pray for no crashes or need to shutdown. It can take hours to boot into a useable system.

Here is the output when I run qubesd as described above:

dom0 ~]$ sudo qubesd
Reflinked file: '/var/lib/qubes/appvms/crypto/private.img' -> '/var/lib/qubes/appvms/crypto/private.img.34@2022-07-03T02:10:55Z~1svqw442'
Renamed file: '/var/lib/qubes/appvms/crypto/private.img.34@2022-07-03T02:10:55Z~1svqw442' -> '/var/lib/qubes/appvms/crypto/private.img.34@2022-07-03T02:10:55Z'
Removed file: '/var/lib/qubes/appvms/crypto/private.img.33@2022-07-03T02:10:55Z'
Renamed file: '/var/lib/qubes/appvms/crypto/private-dirty.img' -> '/var/lib/qubes/appvms/crypto/private.img'
Reflinked file: '/var/lib/qubes/appvms/crypto/private.img' -> '/var/lib/qubes/appvms/crypto/private-precache.img~gkd197sn'
Renamed file: '/var/lib/qubes/appvms/crypto/private-precache.img~gkd197sn' -> '/var/lib/qubes/appvms/crypto/private-precache.img'
Traceback (most recent call last):
  File "/usr/bin/qubesd", line 5, in <module>
    sys.exit(main())
  File "/usr/lib/python3.8/site-packages/qubes/tools/qubesd.py", line 51, in main
    servers = loop.run_until_complete(qubes.api.create_servers(
  File "/usr/lib64/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/usr/lib/python3.8/site-packages/qubes/api/__init__.py", line 430, in create_servers
    cleanup_socket(sockpath, force)
  File "/usr/lib/python3.8/site-packages/qubes/api/__init__.py", line 398, in cleanup_socket
    raise FileExistsError(errno.EEXIST,
FileExistsError: [Errno 17] socket already exists: '/var/run/qubesd.sock'

I’ve noted that it is always the qube ‘crypto’ that shows up here. Could the problem be in this qube? Reinstalling qubes didn’t eliminate this problem, and don’t see many others having this problem.

scallyob · July 26, 2022, 11:45pm

I deleted the qube/vm named “crypto” and rebuilt it and this problem seems resolved.

brendanhoar · July 27, 2022, 3:04am

Hmm has me concerned there night be some unexpected pattern matching on ”crypt” in the Qubes storage drivers and/or lvm config.

B

scallyob · July 27, 2022, 3:32am

the new qube/vm is named “crypto” and so far no problems