Windows Qube lost networking, qrexec access

I’ve been testing a Windows 10 Qube for about a week now, and things have been going pretty well, but tonight, I lost network connectivity. I am thinking it was my ISP had a short outage, because the issue seem to be on my laptop as well, but not 100% sure.

Anyway, when it happened, I swapped back to dom0 and the network qube to check things, and the Windows qube blue screened and shut down unexpectedly. The bugcheck was 0x50, which is PAGE_FAULT_IN_NONPAGED_AREA. I tried restarting it, but it went through the “unclean shutdown” disk check, then shut back down. I noticed then I got the “VM failed to start - qrexec failed to connect to qube” message (paraphrased).

After that, I decided to just do a complete shutdown and power cycle, and try to bring it back up. I got Qubes back up fine, and checked networking, and it was fine. I started the Windows 10 Qube, and it started up fine (which is where I discovered the bugcheck code in Event Viewer), EXCEPT networking was not working. The Xen Net driver was running, but it didn’t seem to be able to pull the IP config. I shut it down again, and the “qrexec failed to connect” message came up again, which I found strange. I started it up again, and still no net, but I checked both sys-net and sys-firewall to check to see that:

  1. The computer was connected to the network, and to the Internet fine.
  2. The vif adapter for the Qube was created with the correct IP and Mac address that matched the generated libvirt template. sys-firewall showed the proper routes for it.

Also, I have persistent USB passthrough devices which showed up in Windows, and I can pass the keyboard and mouse to the Windows Qube normally (likely since it is using IP to connect to the stub domain). However, my detach script I came up with would not work, likely because the communication via qrexec and the qubes daemon back to dom0 was not working. In fact, the Event Viewer said that the QdbDaemon startup timed out after 45 seconds.

I guess the question now is where I need to be looking to solve the problem. The Qubes daemon/qrexec backend problem needs to be solved for sure, but does networking setup run through that link as well? Or is it just handled through the Xen Net driver doing DHCP to (I guess) sys-net or sys-firewall? Does the qubes backend rely on networking? Chicken or egg?

I looked at the guest-dm logs, and I don’t see anything out of the ordinary. There are several runs of logs in the log file from startups/shutdowns over the past week, and the latest one looks pretty much like all the previous ones. No errors or warnings that I can see.

It’s late now, so I’ll attack it again tomorrow.

Thank you in advance for any advice!

Here’s the specific error I get:
[Dom0] Error starting Qube! qrexec-daemon startup failed: 2023-05-30 14:10:15.243 qrexec-daemon[23187]: qrexec-daemon.c:135:sigchld_parent_handler: Connection to the VM failed

Right now, I am guessing that the fact that I’ve lost networking means that the qrexec link isn’t working, so I need to focus on why Windows can’t see the network. Clearly, the stubdomain is connected, but for some reason, it isn’t passing the network device correctly to Windows, or Windows is not seeing it anymore.

I checked Device Manager, and the only network devices listed are a bunch of Microsoft WAN miniport drivers, and the Xen net driver (version 8.2.2.1). I tried uninstalling the Xen net driver (without deleting the driver files) and then reinstalling it using Scan for hardware changes. Didn’t seem to make a difference.

Does the Xen net driver replace the RTL8139 driver? Should I try uninstalling and deleting the Xen net driver completely to see if Windows will fall back to the Realtek driver?

I have an itch to install the 9.0.0 version of the driver, but I don’t know if that requires installing at least the 9.0.0 Xen bus driver first. I’ve read mixed reviews on the 9.0.0 Xen PV drivers, and I definitely don’t need to be making the situation any worse. :-/

OK, so I was curious if it might be a DHCP problem and decided to manually assign IPv4 address settings. I followed the guide here: Contents/windows-vm41.md at master · Qubes-Community/Contents · GitHub – the “After Windows Installation” section where it talks about sys-whonix (or failing sys-firewall) networking.

I was able to get it working, but not after setting the network mask to 255.255.255.255. This is also one thing that irks me in the network setup of Qubes. It doesn’t obey proper netmasks/subnetting (and Windows complains about it also). Technically, the netmask should be 255.0.0.0, because the IP Address is 10.137.0.x and the default gateway is 10.138.x.y, but that means the DNS servers will be on the same subnet as well. I should try that and see if it works anyway, but I wonder what the deal is with setting the netmask to /32.

I also noticed that the IPv6 protocol was on, and happened to tick it off to disable it whilst doing these changes, so it is /possible/ that it was stuck on doing DHCP over IPv6, which is also a common problem in Windows networking – it prefers IPv6 over IPv4. Of course, there is no DHCP server serving the IPv6 protocol, so that also makes sense why it was not pulling an IP address.

I may try to turn it back on and set IPv4 back to automatic to be sure, but these are things I tried, and it started working again, so, going to mark this as “fixed” for now. I’ll add any additional testing/comments/results that are pertinent for those that encounter this issue themselves.

I spoke too soon, of course.

I did get networking “fixed”, but the QdbDaemon is still timing out, and qrexec still will kill the Qube when it times out. I can of course increase the dom0 timeout, but I want the Qubes interface working again. No clue what it will take to get it working again, as I am not sure what it needs to work in the first place. Networking is working, so it can’t be that. Maybe I need to reinstall Qubes Windows Tools and see if it corrects itself.

Quick update:

I got into the Windows VM and uninstalled/reinstalled QWT, and this has resolved the qrexec connection issue as well as the networking/dhcp issue. When it reinstalled the Xen network driver, it reset the IPv4 connection negotiation back to Automatic, but it works again now.

As a result, I would recommend the first thing to try if something like this happens to anyone is to uninstall/reinstall QWT and reboot. Don’t need to mess with the manual networking unless you have to in order to get QWT tools to install (shouldn’t need to… qvm-start --install-windows-tools should work fine without networking). You don’t need to reboot Windows between, in fact, I would recommend not doing so, as the Xen drivers are also removed completely, so there might be issues rebooting without them. Maybe. YMMV, caveat lector, and all that.

1 Like