Internal error: libxenlight

Thanks. Can you explain what makes this change necessary?

I didn’t try mirage-firewall so I can’t comment on this.

It’s not needed as you probably name template a regular AppVM.

If you copy mirage-fw into dom0 and use it as the starting kernel for your AppVM, you are using an AppVM, and you don’t have to change to pvgrub2-pvh.

Regarding your issue, can you provide the last lines of your mirage-fw log? I’m a bit surprised that it talk about GUI as it was removed some month ago.

Yeah I’m not sure why it would even need to access a gui API at all, it’s not creating a window or anything, so. And yes, originally it was uploaded into dom0 and is acting as a kernel for the standaloneVM.

Ok so that’s definitely an AppVM, don’t bother with pvgrub2-pvh for the time being :slight_smile:
Would you mind to provide a copy of the log (run in dom0 and only if you understand and agree with the commands, you’ll probably have to adapt them to your system):

qvm-kill mirage-fw && \
rm /var/log/xen/console/guest-mirage-fw.log && \
qvm-start mirage-fw && \
qvm-copy-to-vm anyvm /var/log/xen/console/guest-mirage-fw.log

Could I not just view the log file at /var/log/xen/console/guest-mirage-fw.log && \

Of course yes, you can :slight_smile:

This was just to ease the capture of a full mirage-fw start and run. Would you mind to copy/paste the log?

There are a few files here that seem relevant:

Fatal error: exception Xs_protocol.Error(“EACCES”)
Raised at Xs_protocol.response in file “duniverse/ocaml-xenstore/core/xs_protocol.ml”, line 685, characters 13-28
Called from Xs_client_lwt.Client.rpc.(fun) in file “duniverse/ocaml-xenstore/client_lwt/xs_client_lwt.ml”, line 318, characters 13-50

That’s interresting and may be related to an old issue I never managed to reproduce. Can you describe your VM setup ? So far it seems to occurs when sys-net is not started before mirage-fw (so Qubes will start it, but a race condition might be involved, mirage-fw wanting to write a bit too early).

As I’m unable to reproduce the bug locally, I hope you will accept to try an unikernel with more debug traces around where it seems to fail.

I can share you a precompiled binary file, but you would probably prefer to compile it yourself, if you never had done that before, here are the instructions (all of them can be done in dispVM but you need to add some disk space, at least 4GB as described in GitHub - mirage/qubes-mirage-firewall: A Mirage firewall VM for QubesOS):

you should install opam at least 2.1 (opam - Install) then you can:

opam init && opan sw create 4.14.1 && \
opam sw 4.14.1 && eval $(opam env) && \
opam install mirage -y && \
opam pin git+https://github.com/palainp/mirage-net-xen.git#wait_backend -y

The previous should install the dependencies to have an updated version of the xenstore code which try to wait sys-net to be ready before connecting to it.
Then you can clone qubes-mirage-firewall and compile it:

git clone https://github.com/mirage/qubes-mirage-firewall.git
cd qubes-mirage-firewall && mirage configure -t xen && make depend && dune build

This should produce dist/qubes-firewall.xen that you can upload as usual to dom0.

What I don’t understand is that sys-net is already booted, and yet this error is still occuring. There is no
indication that sys-net is having a race condition here or that its in any way connected to sys-net. It could be a bug in the xenstore code and that seems the most likely issue. Was there a recent update to mirage-firewall that would have created this issue? The only change made recently was a move to fedora-37. The
move to fedora-37 could be related.

What could have changed the ocaml_xenstore code? Could it have been a dom0 update? I just want to understand what would have cause the change in the first place.

The race condition could be in the mirage-xen code, when the mirage-fw uplink is up, mirage-fw tries to write its configuration, but the backend part (the sys-net side may not be ready). The suggested patch to mirage-net-xen is to wait a bit for the backend. But as I can’t test on a reproducible machine I’m a bit stuck here.

I saw you opened an issue on github (Mirage-Firewall Booting Error · Issue #177 · mirage/qubes-mirage-firewall · GitHub) so we should move the conversion there and report any solution here later.