I’ve been using Qubes for a long time without issues. Recently my sys-vpn has been randomly failing to start, forcing me to restart the whole pc.
The error:
Start failed: internal error: libxenlight failed to create new domain 'sys-vpn', see /var/log/libvirt/libxl/libxl-driver.log for details
/var/log/libvirt/libxl/libxl-driver.log:
libxl: libxl_device.c:1146:device_backend_callback: Domain 13:unable to add device with path /local/domain/3/backend/vif/13/0
libxl: libxl_create.c:1938:domcreate_attach_devices: Domain 13:unable to add vif devices
libxl: libxl_device.c:1146:device_backend_callback: Domain 13:unable to remove device with path /local/domain/3/backend/vif/13/0
libxl: libxl_domain.c:1588:devices_destroy_cb: Domain 13:libxl__devices_destroy failed
There’s no apparent reason as to why this started happening. I didn’t attach any pci to sys-vpn, and this happens completely at random. Sometimes I boot without errors, sometimes I need to restart.
This is most likely an issue with netvm to which your sys-vpn is connected - I guess it’s sys-firewall. Is it running, or maybe crashed? Can you check logs there? Is xendriverdomain service running (you can check with systemctl status xendriverdomain command in there)?
I’m using mirage firewall (latest). When the error happened, I tried starting the other vpn vms that are connected to different mirage qubes, and still they failed.
The qubes-firewall service is enabled in qvm-service.
If it happens again I will try with a different firewall, maybe debian or fedora, but it’s strange that it happens even to different mirage qubes.
Hard to tell until I can reproduce consistently. They said that issue was fixed in 0.8.3 and I’m running the latest 0.8.4, so it’s probably a different issue
Issue 155 should have been fixed in the latest release, you’re right. Next time it occurs it will help if you can save the logs of mirage fw, and don’t hesitate to report them here or on github.
To try to reproduce the problem, is your network configuration:
sys-net ← mirage-fw1 ← sys-vpn1 ← appVM1
sys-net ← mirage-fw2 ← sys-vpn2 ← appVM2
sys-net ← mirage-fw3 ← sys-vpn3 ← appVM3
and at some point, one of the sys-vpnX failed to start, presumably because of mirage-fwX?
That’s exactly right, and the error always happened when I set up autostart (put a script in .config/autostart with qvm-start sys-vpn1 sys-vpn2), autostart not enabled in qubes settings. Also, when I removed the script, and start vm manually, the error didn’t happen
The mirage logs are in /var/log/xen/console/guest-mirage-vm.log with mirage-vm the name of any of your mirage-fw. Every start appends data, so you probably can get a previous fail based on the log timestamp.
Another question, maybe related, have you set any DNS fw rules on your sys-vpnX and your sys-net takes some times before getting an IP (e.g. with a wireless connexion)?
Oh that’s odd, it seems that the mirage unikernel doesn’t even boot correctly (you should get a bit after that something like INF [client_net] add client vif {domid=...;device_id=...} with IP ... when a client connects to the fw.
Can you try to autostart one without any client VM and check if it boots correctly (it should print a memory pressure status after the initialisation steps INF [memory_pressure] Writing meminfo: free 20MiB / 27MiB (72.67 %)?
Today sys-net was in autostart, then I started a browser vm to start all the chain (mirage-fw > sys-vpn). All mirage firewalls failed again (same message as before).
I set netvm of sys-vpn to empty, sys-vpn started correctly.
I removed dns script from sys-vpn ( /rw/config/qubes-firewall-user-script) and tried restarting it with mirage-fw netvm: error again.
I tried with new mirage-fw: cloned mirage-fw-backup (never opened, completely original state): error.
I restarted the computer, sys-net autostart, then manual start for the net chain: all ok, no errors.
My setup is, from a fresh 4.1.1 install and qubes-mirage-fw 0.8.4 binary from github (I also updated dom0 with qubes-dom0-update and it happens to work too):
sys-net ← mirage-fw ← AppVM
In autostart the directive:
Exec=qvm-run appVM gnome-terminal
or
Exec=qvm-start appVM (both result to the correct)
It’s quite similar to your 3rd try sys-vpn without any special script (I just removed the VPN part, but since it’s the VM that didn’t start, it should also fail with me).
I might fail to see a difference from my setup and yours, would you mind to try out to start a normal appVM with a mirage netvm from you autostart script?
I’ll try to add your iptable script in my AppVM on my side.
It doesn’t seem to make a difference… The error happens even without script
I also use binary from github and latest update in dom0
So without sys-vpn? Appvm > mirage-fw > sys-net ?
I forgot to say this before: I started mirage-fw-2 without attached vpn, and the logs were the same as the mirage-fw started with attached sys-vpn (error). So it looks like it doesn’t matter if there is a vm attached to mirage-fw.
[...@dom0 ~]$ qvm-prefs mirage-fw-2
audiovm D None
autostart D False
backup_timestamp U
debug D False
default_dispvm - None
default_user D ...
dns D ....1 ....2
gateway D ...
gateway6 D
guivm D dom0
icon D servicevm-green
include_in_backups D True
installed_by_rpm D False
ip D ...
ip6 D
kernel - mirage-firewall
kernelopts -
keyboard_layout D ...
klass D StandaloneVM
label - green
mac D ...
management_dispvm D default-mgmt-dvm
maxmem - 0
memory - 32
name - mirage-fw-2
netvm - sys-net
provides_network - True
qid - 14
qrexec_timeout D 60
shutdown_timeout D 60
start_time D ...
stubdom_mem U
stubdom_xid D -1
template_for_dispvms D False
updateable D True
uuid - ...
vcpus - 1
virt_mode - pvh
visible_gateway D ...
visible_gateway6 D
visible_ip D ...
visible_ip6 D
visible_netmask D ...
xid
So I set exactly the same properties and it still work here
Meanwhile, I found that a very similar problem was reported for an earlier version: Firewall may refuse to create upstream VM's VIF if downstream "provides network VM" does not present VIF in time? · Issue #107 · mirage/qubes-mirage-firewall · GitHub.
The failure has the same exception (unable to access a xenstore value), and the setup seems to be exactly the same: sys-net is not autostarted before mirage starts. I’m currently not sure how to investigate this problem or how to solve it, but just to rule out a timing issue, could you try updating the .desktop with :
Exec=“qvm-start sys-net && sleep 1 && qvm-start sys-vpn” (not sure if this syntax would work)