Libxenlight failed, unable to add vif devices

I’ve been using Qubes for a long time without issues. Recently my sys-vpn has been randomly failing to start, forcing me to restart the whole pc.

The error:

Start failed: internal error: libxenlight failed to create new domain 'sys-vpn', see /var/log/libvirt/libxl/libxl-driver.log for details

/var/log/libvirt/libxl/libxl-driver.log:

libxl: libxl_device.c:1146:device_backend_callback: Domain 13:unable to add device with path /local/domain/3/backend/vif/13/0
libxl: libxl_create.c:1938:domcreate_attach_devices: Domain 13:unable to add vif devices
libxl: libxl_device.c:1146:device_backend_callback: Domain 13:unable to remove device with path /local/domain/3/backend/vif/13/0
libxl: libxl_domain.c:1588:devices_destroy_cb: Domain 13:libxl__devices_destroy failed

There’s no apparent reason as to why this started happening. I didn’t attach any pci to sys-vpn, and this happens completely at random. Sometimes I boot without errors, sometimes I need to restart.

This is most likely an issue with netvm to which your sys-vpn is connected - I guess it’s sys-firewall. Is it running, or maybe crashed? Can you check logs there? Is xendriverdomain service running (you can check with systemctl status xendriverdomain command in there)?

Hello, thank you for the reply!

I’m using mirage firewall (latest). When the error happened, I tried starting the other vpn vms that are connected to different mirage qubes, and still they failed.

The qubes-firewall service is enabled in qvm-service.

If it happens again I will try with a different firewall, maybe debian or fedora, but it’s strange that it happens even to different mirage qubes.

Ah, mirage firewall, then maybe it’s mirage vm crash on downstream vm start · Issue #155 · mirage/qubes-mirage-firewall · GitHub ?

Hard to tell until I can reproduce consistently. They said that issue was fixed in 0.8.3 and I’m running the latest 0.8.4, so it’s probably a different issue

Thank you for helping me so far!

Issue 155 should have been fixed in the latest release, you’re right. Next time it occurs it will help if you can save the logs of mirage fw, and don’t hesitate to report them here or on github.

To try to reproduce the problem, is your network configuration:
sys-net ← mirage-fw1 ← sys-vpn1 ← appVM1
sys-net ← mirage-fw2 ← sys-vpn2 ← appVM2
sys-net ← mirage-fw3 ← sys-vpn3 ← appVM3
and at some point, one of the sys-vpnX failed to start, presumably because of mirage-fwX?

That’s exactly right, and the error always happened when I set up autostart (put a script in .config/autostart with qvm-start sys-vpn1 sys-vpn2), autostart not enabled in qubes settings. Also, when I removed the script, and start vm manually, the error didn’t happen

Thank you for looking at it!

Can you tell me how to save mirage logs? Thanks!

The mirage logs are in /var/log/xen/console/guest-mirage-vm.log with mirage-vm the name of any of your mirage-fw. Every start appends data, so you probably can get a previous fail based on the log timestamp.

Another question, maybe related, have you set any DNS fw rules on your sys-vpnX and your sys-net takes some times before getting an IP (e.g. with a wireless connexion)?

In sys-vpn I added this: /rw/config/qubes-firewall-user-script

iptables -F OUTPUT
iptables -I FORWARD -o eth0 -j DROP
iptables -I FORWARD -i eth0 -j DROP
iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS  --clamp-mss-to-pmtu
iptables -F PR-QBS -t nat
iptables -A PR-QBS -t nat -d $virtualif -p udp --dport 53 -j DNAT --to $vpndns1
iptables -A PR-QBS -t nat -d $virtualif -p tcp --dport 53 -j DNAT --to $vpndns1

in qvm-firewall for mirage-fw I have allowed dns requests.

I found the logs from when I reported the issue (identifiers replaced with …)

Logfile Opened
Solo5: Xen console: port 0x2, ring @0x...
            |      ___|
  __|  _ \  |  _ \ __ \
\__ \ (   | | (   |  ) |
____/\___/ _|\___/____/
Solo5: Bindings version v0.7.5
Solo5: Memory map: 32 MB addressable:
Solo5:   reserved @ (0x0 - 0xfffff)
Solo5:       text @ (0x... - 0x...)
Solo5:     rodata @ (0x... - 0x...)
Solo5:       data @ (0x... - 0x...)
Solo5:       heap >= 0x... < stack < 0x...
INF [qubes.rexec] waiting for client...
INF [qubes.db] connecting to server...
INF [qubes.db] connected
INF [qubes.rexec] client connected, using protocol version 3
INF [unikernel] QubesDB and qrexec agents connected in 0.026 s
INF [dao] Got network configuration from QubesDB:
            NetVM IP on uplink network: ...
            Our IP on uplink network:   ...
            Our IP on client networks:  ...
            DNS primary resolver:       ...
            DNS secondary resolver:     ...
INF [net-xen frontend] connect 0
INF [net-xen frontend] create: id=0 domid=1
INF [net-xen frontend]  sg:true gso_tcpv4:true rx_copy:true rx_flip:false smart_poll:false
INF [net-xen frontend] MAC: ...
WRN [command] << Unknown command "QUBESRPC qubes.SetMonitorLayout dom0"
WRN [command] << Unknown command "QUBESRPC qubes.SetMonitorLayout dom0"
Fatal error: exception Xs_protocol.Error("EACCES")
Raised at Xs_protocol.response in file "duniverse/ocaml-xenstore/core/xs_protocol.ml", line 685, characters 13-28
Called from Xs_client_lwt.Client.rpc.(fun) in file "duniverse/ocaml-xenstore/client_lwt/xs_client_lwt.ml", line 318, characters 13-50
Called from Lwt.Sequential_composition.bind.create_result_promise_and_callback_if_deferred.callback in file "duniverse/lwt/src/core/lwt.ml", line 1849, characters 23-26
Re-raised at Lwt.Miscellaneous.poll in file "duniverse/lwt/src/core/lwt.ml", line 3077, characters 20-29
Called from Xen_os__Main.run.aux in file "duniverse/mirage-xen/lib/main.ml", line 37, characters 10-20
Called from Dune__exe__Main.run in file "main.ml" (inlined), line 3, characters 12-29
Called from Dune__exe__Main in file "main.ml", line 115, characters 5-10
Solo5: solo5_exit(2) called

Oh that’s odd, it seems that the mirage unikernel doesn’t even boot correctly (you should get a bit after that something like INF [client_net] add client vif {domid=...;device_id=...} with IP ... when a client connects to the fw.

Can you try to autostart one without any client VM and check if it boots correctly (it should print a memory pressure status after the initialisation steps INF [memory_pressure] Writing meminfo: free 20MiB / 27MiB (72.67 %)?

Yes, I will try in a bit of time and will post the results. Thank you for helping with the issue so far!

1 Like

Thanks for reporting it. I don’t use autostart yet but I’ll try to reproduce too.

Today sys-net was in autostart, then I started a browser vm to start all the chain (mirage-fw > sys-vpn). All mirage firewalls failed again (same message as before).

I set netvm of sys-vpn to empty, sys-vpn started correctly.

I removed dns script from sys-vpn ( /rw/config/qubes-firewall-user-script) and tried restarting it with mirage-fw netvm: error again.

I tried with new mirage-fw: cloned mirage-fw-backup (never opened, completely original state): error.

I restarted the computer, sys-net autostart, then manual start for the net chain: all ok, no errors.

I tried to reproduce with no luck here :confused: or :slightly_smiling_face:

My setup is, from a fresh 4.1.1 install and qubes-mirage-fw 0.8.4 binary from github (I also updated dom0 with qubes-dom0-update and it happens to work too):
sys-net ← mirage-fw ← AppVM
In autostart the directive:
Exec=qvm-run appVM gnome-terminal
or
Exec=qvm-start appVM (both result to the correct)

It’s quite similar to your 3rd try sys-vpn without any special script (I just removed the VPN part, but since it’s the VM that didn’t start, it should also fail with me).

I might fail to see a difference from my setup and yours, would you mind to try out to start a normal appVM with a mirage netvm from you autostart script?
I’ll try to add your iptable script in my AppVM on my side.

Hello, thank you for trying to reproduce!

It doesn’t seem to make a difference… The error happens even without script :frowning:

I also use binary from github and latest update in dom0

So without sys-vpn? Appvm > mirage-fw > sys-net ?

I forgot to say this before: I started mirage-fw-2 without attached vpn, and the logs were the same as the mirage-fw started with attached sys-vpn (error). So it looks like it doesn’t matter if there is a vm attached to mirage-fw.

Would you mind to share the result of qvm-prefs mirage-fw-2?

[...@dom0 ~]$ qvm-prefs mirage-fw-2
audiovm               D  None
autostart             D  False
backup_timestamp      U
debug                 D  False
default_dispvm        -  None
default_user          D  ...
dns                   D  ....1 ....2
gateway               D  ...
gateway6              D  
guivm                 D  dom0
icon                  D  servicevm-green
include_in_backups    D  True
installed_by_rpm      D  False
ip                    D  ...
ip6                   D  
kernel                -  mirage-firewall
kernelopts            -  
keyboard_layout       D  ...
klass                 D  StandaloneVM
label                 -  green
mac                   D  ...
management_dispvm     D  default-mgmt-dvm
maxmem                -  0
memory                -  32
name                  -  mirage-fw-2
netvm                 -  sys-net
provides_network      -  True
qid                   -  14
qrexec_timeout        D  60
shutdown_timeout      D  60
start_time            D  ...
stubdom_mem           U
stubdom_xid           D  -1
template_for_dispvms  D  False
updateable            D  True
uuid                  -  ...
vcpus                 -  1
virt_mode             -  pvh
visible_gateway       D  ...
visible_gateway6      D  
visible_ip            D  ...
visible_ip6           D  
visible_netmask       D  ...
xid    

So I set exactly the same properties and it still work here :confused:

Meanwhile, I found that a very similar problem was reported for an earlier version: Firewall may refuse to create upstream VM's VIF if downstream "provides network VM" does not present VIF in time? · Issue #107 · mirage/qubes-mirage-firewall · GitHub.
The failure has the same exception (unable to access a xenstore value), and the setup seems to be exactly the same: sys-net is not autostarted before mirage starts. I’m currently not sure how to investigate this problem or how to solve it, but just to rule out a timing issue, could you try updating the .desktop with :
Exec=“qvm-start sys-net && sleep 1 && qvm-start sys-vpn” (not sure if this syntax would work)