Mirage firewall breaks connectivity daily

Mirage is a Unikernel that can replace the sys-firewall in QubesOS. It is not shipped natively and as such, support is not assumed here. This will likely be a problem to follow up at GitHub - mirage/qubes-mirage-firewall: A Mirage firewall VM for QubesOS

But for sake of discussion - is anyone using Mirage having or ever had problems with connectivity breaking? Often I’ll leave my machine, come back and all networking (except sys-net) is dead. I must restart sys-mirage-firewall for networking to resume. And as soon as it restarts, everything starts kicking back off where it began.

Obviously not ideal. Real world behaviour should not see any interruption in connectivity ever.
I found the problem a lot worse/frequent with default 32M RAM. After I upped the CPUs and RAM a bit, it does it less frequently. I’ve just upped it again to 128MB/4c and will monitor before making any bug report.

AppVM logs don’t reveal much either unfortunately.

Hi, thanks for the report. I don’t observe such issue on my laptop but unfortunately I don’t have 4.2 right now :frowning:

When you say AppVM logs don’t provide information, you talk about mirage-fw logs right?
FWIW there’s a pending PR that fixes an issue with uplink which might fits your symptoms description, if you’re ok with that would you mind to compile locally and try out a fresh build?

1 Like

Sure am. I’ll have another look if/when the problem reoccurs. I just recall seeing traffic type logging. No particular errors.

I compiled with docker per the instructions on Github, and my system is a very recent install of 4.2. So shouldn’t be anything funky going on.
Would the current upstream PR only relate if sys-net (my only upstream from sys-mirage-firewall) loses connectivity? This issue seems unrelated to any sys-net issues. I can always open a terminal in sys-net and ping successfully when sys-mirage-firewall is in this broken state.

The current mirage-fw also doesn’t support its netvm to be changed/restarted. The connectivity loss can also occurs if sys-net changes the vif dedicated to its mirage-fw client (unsure why this could happen).

But thinking twice, if raising the memory solves/delays the issue, it’s propably something else related to an old memory fragmentation issue being back :frowning:

Ah, this sounds plausible. Is there any superficial testing I could do to help diagnose this as a potential issue?

Should I expect to see mem use gradually increase under dom0 xentop for example?

No it won’t, memory is reserved once at startup and the Ocaml runtime ask for memory from time to time but that won’t be visible from Xen.

If the pressure on memory (e.g. from fragmentation) is getting high, the unikernel starts to call the GC more often and if it still not enough, it starts to drop packets (but tracing was removed some time ago). You might observe a high CPU usage for a while due to the GC calls. And finally at some point it should kill itself with a “out of memory” log message.

The last release also doesn’t report to Xen the memory information to stop being involved in the memory ballooning process, maybe I failed somewhere there and it’s mandatory with 4.2?

I don’t know if it is mandatory, but since you mention memory balooning and 4.2, have you seen this thread @palainp ?

1 Like

Another thing I’ve noticed is that sys-firewall will (as expected) saturate my 1Gb link, but mirage will top out and only achieve ~530mbps.

Yes TCP Segmentation Offload isn’t available with mirage so far. That’s something to code in the future.

You should be able to check that with (in sys-net or sys-firewall : sudo ethtool -K eth0 tso off, may need to adapt the interface name).

Thanks ! I’ll try to upgrade my laptop soon ™ to investigate further.