When I start an AppVM with memory 400/4000, it starts with 400 MB RAM. However, it takes very long time to allocate more RAM. In the meantime, the machine is pretty much unusable, b/c the minimum amount does not let my apps run on it. After 10-15 minutes, it gets RAM.
It began to behave like this quite recently, because my computing habits have not changed.
I chose not to re-install mirage firewall after a fresh Qubes OS install in light of my problem. I recommend the same.
What worries me is that in light of the apparent rarity of the cases of the problem of memory management, the problem would be a hack via a vulnerability in mirage firewall which corrupts a few things in Dom0 and lead to the symptoms you and I have seen. I have no proof of that and I could be just paranoid. But then again, I didn’t switch to Qubes OS (only) for the fun of it.
I’m interested to dig into that issue, esp. if there is a vulnerability opened. I currently have no clues on what can be the root cause here, I also have a 32GB laptop and I don’t observe such memory limitations
Would you mind to (privately or not, mail, DM, etc.) share some details on your setup and how I can try to reproduce it?
As a side note, when I start an AppVM it also starts with 400MB (AppVM started with qvm-run vmname gnome-terminal) and raises at around 4000MB. Not sure exactly when it memory is added, but I can see something related to the ballooning procedure in the vm logs:
tail -f /var/log/xen/console/guest-vmname.log
....
xen:balloon: Initialising balloon driver
acpi LNXCPU:02: Still not present
acpi LNXCPU:03: Still not present
....
Maybe you can check the logs and see if something explain why it’s stuck there? The terminal window appears as soon as the message
Fedora Linux 36 (Thirty Six)
Kernel 5.15.103-1.qubes.fc32.x86_64 on an x86_64 (hvc0)
vnmane login:
is printed in the logs.
EDIT: The template is outdated but I have the same results with a Fedora-37 template.
I had formatted my laptop and re-installed Qubes OS. I don’t have the logs anymore.
My setup is mostly Debian minimal Qubes. Browse Qube (AppVM) that was stuck at 400 MB RAM is Debian Minimal. The Disposable Whonix disp9125 was also stuck at 400 MB RAM. This is a standard Whonix DVM.
I have the same problem on my machine with 16 GB and mirage-firewall installed. Shutting down a qube reallocates memory to the one with 400 MB, so the workaround I’ve been using is to start a disposable and shut it down whenever this happens.
I was thinking the same thing, that the dom0 portion of qubes memory management was getting stuck on a VM that wasn’t responding the way it expected, e.g. a VM tagged/configured as a qmm client but that cannot respond…
I should note that maxmem 32 was present at the time of qube creation as a parameter to qvm-create. But apparently it did not get through/was overwritten…
i think that was an early conclusion by me, setting maxmem did not help.
However, it looks like killing the mirage-firewall and restarting helps (qvm-kill/qvm-start)…
After restarting mirage fw, new VMs get proper memory, whether balancing is disabled or not (checked both cases)
Well, there are more info. I have rebooted the laptop. The issue did not reproduce… until I started a huge VM (3000/30000). VMs started after the huge VM got stuck at 400M.
Then I thought if this would reproduce when sys-firewall is used instead. Repeated the above with sys-firewall, but the new VMs do not get stuck at 400M.
Regarding the logs, there are logs like the following:
journalctl:
May 24 10:01:51 dom0 qubesd[1597]: vm.mirage-firewall: Activating the mirage-firewall VM
I can confirm the issue is reproducible on my laptop (huge VM 3G/30G, stabilized at around 20G, then another AppVM is stuck with low memory). It suggests that this is not an open vulnerability into mirage-fw that permits to escape the fw, but still an issue with mirage-fw in respect with Qubes memory management (and it definitely should be fixed at some point) when there is a high memory pressure on the system.
I’ll try to reproduce this issue with the unikernel onboarded inside a template VM (kernel is pvgrub2-pvh, the unikernel boots via multiboot), but this probably won’t solve the issue.
And last, I’ll have to understand what is expected from Qubes to not disturb the memory management process like that, maybe a Qubes dev team member have an idea about that?
As expected I reproduced the issue with a multiboot unikernel. As I currently understand the issue:
the system is under memory pressure (some AppVM won’t be able to get their highmem value)
mirage-fw is excluded from the memory balancing process (but still seems to be implied at some point, didn’t checked without it, but @rrn has done that)
starting a new VM is slow because that VM have to use swap due to being at its lowmem value
shutting down an AppVM release some memory and permits to increase the memory for other AppVMs
journalctl -f -u qubes-qmemman doesn’t gives me many informations, I got this morning
Xen free = 17325363471 too small for satisfy assignments! assigned_but_unused=17471681968, domdict={'0': {'memory_current': 4278190080, 'memo...
and later
dom '22' still hold more memory than have assigned (18972307456 > 18531172964)