Debian 11 template broken after wifi driver install. Anything to do other than reinstall?

coyoteltd · June 15, 2022, 1:37am

Because of quirks of this particular system, reinstalling takes two days so I’d rather find other avenues to pursue.

I chose Debian as my default template. . . I know this is less well supported, but I’d prefer to keep it this way because “reasons”.

That’s neither here nor there though. After setting up the Wifi drivers in the debian-11 template, that template and VMs based off of it will no longer start. The system complains

Cannot connect to qrexec agent for 60 seconds, see /var/log/xen/console/guest-<vm name>.log for details

Unfortunately, there’s NO details in the referenced log files at all. . . nothing new is written into them. The last lines are just what was written there the last time the VM ran correctly (2:34 PM this afternoon).

Googling and reading, I haven’t been able to suss out how to troubleshoot a VM in these circumstances. Is there a way to either access the VMs syslog when it’s powered off or see / log the VMs console output while it’s booting? It’s gotta be spitting out something, before it just dies, but that stuff might as well get piped to /dev/null for all I can tell.

Any other suggestions? Thanks very much.

Note: As I have no network, I’m not sure how to just reinstall the template. . . (I didn’t make a backup because I’m a dumb idiot). Is there a way to do this using a usb drive?

HPOA909 · June 15, 2022, 1:43am

Do you have the Network Manager in your default Debian template?

coyoteltd · June 15, 2022, 1:44am

Yeah.

HPOA909 · June 15, 2022, 1:47am

Here is my suggestion for you because the sys-net qube may have some errors that you need to plug you PC into the ethernet or wifi. The best thing you shall do for this situation that I have that issue similar before in the month of March this year was to make sys-net using the template from fedora-34 first then reinstall the default template again to see what happens.

coyoteltd · June 15, 2022, 1:50am

Unfortunately, I don’t currently have the ability to connect the machine to a network. I ALSO have driver issues with my USB-c to Ethernet dongle. That’s why I asked about reinstalling just the template from USB or install media. But I hear you and thanks for the suggestion.

HPOA909 · June 15, 2022, 1:52am

may be you should type this command to the dom0 terminal for this particular reasons

sudo qubesctl state.sls qvm.sys-usb

otherwise the mods or dev will help you via this link: GitHub - QubesOS/qubes-issues: The Qubes OS Project issue tracker

tzwcfq · June 15, 2022, 4:47am

You can increase the qrexec_timeout value for your debian template in dom0 terminal:
qvm-prefs debian-11 qrexec_timeout 3600
Then start debian-11 template and open it’s console from dom0 terminal:
qvm-console-dispvm debian-11
Search for errors there and try to fix them.

coyoteltd · June 15, 2022, 7:05am

@tzwcfq Thank you.

Here’s what happens. After an hour, the system says

Cannot connect to qrexec agent for 3600 seconds, see /var/log/xen/console/guest-debian-11.log for details.

Then the window running qvm-console-dispvm spits out:

dispvm:default-mgmt-dvm: Cannot connect to qrxexc agent for 3600 seconds, see /var/log/xen/console/guest-disp6628.log for details
Traceback (most recent call last):
  File "usr/bin/qvm-run", line 5, in <module_>
    sys.exit(main())
  File "/usr/lib/python3.8/site-packages/qubesadmin/tools/qvm_run.py", line 335, in main
    dispvm.cleanup()
  File "/usr/lib/python3.8/site-packages/qubesadmin/vm/__init__.py", line 410, in cleanup
    self.kill()
  File "/usr/lib/python3.8/site-packages/qubesadmin/vm/__init__.py", line 126, in kill
    self.qubesd_call(self._method_dest, 'admin.vm.Kill')
  File "/usr/lib/python3.8/site-packages/qubesadmin/app.py", line 74, in qubesd_call
    return self.app.qubesd_call(dest, method, arg, payload)
  File "/usr/lib/python3.8/site-packages/qubesadmin/app.py", line 748, in qubesd_call
    return self._parse_qubesd_response(return_data)/
  File "/usr/lib/python3.8/site-packages/qubesadmin/base.py", line 109, in_parse_qubesd_response
    raise exec_class(format_string, *args)
qubesadmin.exc.QubesVMNotFoundError: No such domain: 'disp6628'

Ok. This looks like the qvm-console-dispvm command calls an intermediary disposable qube to do whatever it’s doing. . . and unless I’m mistaken, that qube is based on debian-11 and so is affected by the crash and so whatever is going on remains opaque.

So let’s check the log files and see if there’s anything different. And there is! There’s actually stuff in guest-disp6628.log. Here’s what seems relevant:

[2022-06-14 21:26:24] [.[0;1;31mFAILED.[0m] Failed to start .[0;1;39mLSB: Xen daemons.[0m.
[2022-06-14 21:26:24] See 'systemctl status xen.service' for details.
[2022-06-14 21:26:24] [.[0;1;31mFAILED.[0m] Failed to start .[0;1;39mQubes base firewall settings.[0m.
[2022-06-14 21:26:24] See 'systemctl status qubes-iptables.service' for details.
[2022-06-14 21:26:24]          Starting .[0;1;39mQubes misc post-boot actions.[0m...
[2022-06-14 21:26:25] [.[0;1;31mFAILED.[0m] Failed to start .[0;1;39mXen driver domain device daemon.[0m.
[2022-06-14 21:26:25] See 'systemctl status xendriverdomain.service' for details.

What’s interesting, is that the boot continues in the log. There’s even a line that makes me hopeful the new driver sees my wifi card. The end of the log is a “localhost: login:”

So that tells me the Qube IS booting, but that the Xen services are crashing / failing to start, so of course dom0 can’t talk to it!

This also seems weird that installing a Wifi kernel module would cause the Xen services to fail. . . but the kernel is witchy black magick as far as I’m concerned.

Is that iptables failure a clue though? Networking is borked so Xen can’t listen on local tcp/ip ports maybe?

NOW. . . HOW in the name of the swirling chaos at the Center of All Creation, do I pull syslog from a Qube that’s not running? Or from a Qube that IS running but that Xen can’t talk to?

tzwcfq · June 15, 2022, 8:10am

If qvm-console-dispvm doesn’t open a new windows with disposable terminal then try to open the console with directly xen. Start debian-11 template and run this command in dom0 terminal:
sudo xl console debian-11
Then login as user and check for errors in template logs.
You can exit from console in dom0 terminal with Ctrl+].

coyoteltd · June 16, 2022, 12:06am

I got in! Thank you again.

The issue seems to be caused by

FAILED. Failed to mount /proc/xen.  Unknown filesystem type.
See 'systemctl status proc-xen.mount' for details.

Googling for this error brings up exactly one thread on github that DOES relate to Qubes, but seems to be a different edge case, even if the symptoms are similar (at least my kernel modules aren’t missing, maybe the filesystem is somehow getting relabeled, but I don’t know how to tell or what it should be labeled as).

I haven’t been able to suss out anything else useful from systemctl status or journalctl and the shell that xl console dumps me into is pretty gnarly to try to use. But hey, if it works. . .

I feel like maybe I just need to bite the bullet and reinstall again. . . if I’d started with that, I could be done by now. XD

Rhys-Hussain · June 30, 2022, 12:49am

A standalone debian 11 VM can’t start because of can’t mount after upgrading debian 11. But my other standalone debian 11 VMs and APP VMs that use debian 11 template can still work. I sovle the problem by creating a new standalone debian 11 VM from template.

Rhys-Hussain · July 23, 2022, 3:52am

The problem occurs again, so I decide to give up debian 11 standalone and to use Ubuntu 22.04. I copy files from debian 11 to Ubuntu according to Restoring Qubes Without Backup.