VMs going transient overnight

I am having an annoying issue. I have installed Qubes OS 4.2 in my office computer, which I always leave on (since sometimes I need to access it remotely via SSH). I have a couple of VMs running (set to autostart) so that I can actually access them remotely, which usually works (unless I mess up with sys-net or sys-firewall, but that’s a different story).

The strange thing is that when I come in the morning I find some of the VMs in a “transient” state (yellow in the Qube Manager GUI). If I try to open any apps in the VMs nothing happens, but the “Console log” I can access through the applet in the notification does not seem to show any particular errors or messages that let me find out what happened. Network access to the VMs seems to still work though, which makes it weirder (but I would need to reconfirm it).

Restarting the VMs makes them work normally, but it feels that I won’t be able to reliably leave qubes running unattended and that’s a problem. I would appreciate any suggestions on how to debug this and find out what’s going on.

Thanks!

You can check the dom0 logs using journalctl and search for messages related to the affected qubes.

Just an idea : the machine went to the suspend mode. The morning, the suspend out failed to wake up some VMs.

1 Like

Suspending the main machine could have been an issue, but I checked and suspend is disabled. So that’s not the case… Thanks for the suggestion!

I had to leave for about a week, and I left everything running. Some of the VMs did not have problems, but many still did.

I found some logs with information via the Qube Manager in the end (not really in dom0’s journalctl).

In all the transient qubes the X server had died, and I see messages like this:

2024-02-03 00:55:36.432 qrexec-daemon[632097]: qrexec-daemon.c:1051:handle_execute_service: fork: Resource temporarily unavailable
/etc/qubes-rpc/qubes.WaitForSession: /var/run/qubes/qrexec.23 not found, domain 
might be dead
/etc/qubes-rpc/qubes.WaitForSession: /var/run/qubes/qrexec.23 not found, domain might be dead
2024-02-03 00:55:36.434 qrexec-client[684510]: qrexec-client.c:250:wait_for_session_maybe: wait-for-session exited with status 256
2024-02-03 00:55:36.434 qrexec-client[684510]: exec.c:461:execute_qrexec_service: qubes_connect: Connection refused

I wonder if the qubes may somehow running out of memory and then dying…
I have one qube that has a lot more memory allocated to it (about half the system’s), and I wonder if it may be depriving others of it.

Nope, I stopped that memory-hog qube and things seemed to be the same.
Yesterday there were some updates to dom0. It also looks like I left my session open, and when I came back no VMs were transient. I’ll look a bit more into that.

It seems that after the dom0 update from February 13, 2024 my VMs are much more likely to survive overnight. Something must have been fixed in that update.