Memory allocation problem (remains in low allocation, for minutes)

Hi,

When I start an AppVM with memory 400/4000, it starts with 400 MB RAM. However, it takes very long time to allocate more RAM. In the meantime, the machine is pretty much unusable, b/c the minimum amount does not let my apps run on it. After 10-15 minutes, it gets RAM.

It began to behave like this quite recently, because my computing habits have not changed.

Any ideas on what to do to diagnose/fix this?

I had the same problem of being stuck at 400 MB RAM. I described it here.

No one responded so I did the only thing I know to do: re-install Qubes OS.

Very interesting that that thread mentions mirage-firewall, which I have installed very recently. How could that affect the memory management of Xen?

I chose not to re-install mirage firewall after a fresh Qubes OS install in light of my problem. I recommend the same.

What worries me is that in light of the apparent rarity of the cases of the problem of memory management, the problem would be a hack via a vulnerability in mirage firewall which corrupts a few things in Dom0 and lead to the symptoms you and I have seen. I have no proof of that and I could be just paranoid. But then again, I didn’t switch to Qubes OS (only) for the fun of it.

I’m interested to dig into that issue, esp. if there is a vulnerability opened. I currently have no clues on what can be the root cause here, I also have a 32GB laptop and I don’t observe such memory limitations :frowning:

Would you mind to (privately or not, mail, DM, etc.) share some details on your setup and how I can try to reproduce it?

As a side note, when I start an AppVM it also starts with 400MB (AppVM started with qvm-run vmname gnome-terminal) and raises at around 4000MB. Not sure exactly when it memory is added, but I can see something related to the ballooning procedure in the vm logs:

tail -f /var/log/xen/console/guest-vmname.log
....
xen:balloon: Initialising balloon driver
acpi LNXCPU:02: Still not present
acpi LNXCPU:03: Still not present
....

Maybe you can check the logs and see if something explain why it’s stuck there? The terminal window appears as soon as the message

Fedora Linux 36 (Thirty Six)
Kernel 5.15.103-1.qubes.fc32.x86_64 on an x86_64 (hvc0)
vnmane login:

is printed in the logs.

EDIT: The template is outdated but I have the same results with a Fedora-37 template.

I had formatted my laptop and re-installed Qubes OS. I don’t have the logs anymore.

My setup is mostly Debian minimal Qubes. Browse Qube (AppVM) that was stuck at 400 MB RAM is Debian Minimal. The Disposable Whonix disp9125 was also stuck at 400 MB RAM. This is a standard Whonix DVM.

I have the same problem on my machine with 16 GB and mirage-firewall installed. Shutting down a qube reallocates memory to the one with 400 MB, so the workaround I’ve been using is to start a disposable and shut it down whenever this happens.

1 Like

Just to rule out an idea, is your mirage-fw included in the ballooning process, you can check that in the qubes-manager settings? (It shouldn’t)

Great finding. I can confirm that a qube shutdown helps as a workaround!

I was thinking the same thing, that the dom0 portion of qubes memory management was getting stuck on a VM that wasn’t responding the way it expected, e.g. a VM tagged/configured as a qmm client but that cannot respond…

B

I have noticed that the qube settings reports 32/32 MB whereas qvm-prefs reports maxmem to be 0.
After setting it to 32 the issue no longer happens.

qvm-prefs -s mirage-firewall maxmem 32

qube settings still displays memory balancing is enabled for mirage fw.
I suspect some info on the settings page are stale/invalid.

I should note that maxmem 32 was present at the time of qube creation as a parameter to qvm-create. But apparently it did not get through/was overwritten…

That’s great !
You can probably also uncheck that setting for mirage-fw with qubes-manager GUI just in case.

I’ll try to figure out how this can be done with cli and update the AppVM creation procedure.

i think that was an early conclusion by me, setting maxmem did not help.

However, it looks like killing the mirage-firewall and restarting helps (qvm-kill/qvm-start)…
After restarting mirage fw, new VMs get proper memory, whether balancing is disabled or not (checked both cases)

Would you mind to share a couple of the last lines from mirage-fw logs?

Well, there are more info. I have rebooted the laptop. The issue did not reproduce… until I started a huge VM (3000/30000). VMs started after the huge VM got stuck at 400M.

Then I thought if this would reproduce when sys-firewall is used instead. Repeated the above with sys-firewall, but the new VMs do not get stuck at 400M.

Regarding the logs, there are logs like the following:

journalctl:

May 24 10:01:51 dom0 qubesd[1597]: vm.mirage-firewall: Activating the mirage-firewall VM

xen/console:

[2023-05-24 10:01:51] 2023-05-24 07:01:51 -00:00: INF [net-xen frontend] connect 0
[2023-05-24 10:01:51] 2023-05-24 07:01:51 -00:00: INF [net-xen frontend] create: id=0 domid=2
[2023-05-24 10:01:51] 2023-05-24 07:01:51 -00:00: INF [net-xen frontend]  sg:true gso_tcpv4:true rx_copy:true rx_flip:false smart_poll:false


[2023-05-24 10:01:51] 2023-05-24 07:01:51 -00:00: WRN [command] << Unknown command "QUBESRPC qubes.SetMonitorLayout dom0"

I can confirm the issue is reproducible on my laptop (huge VM 3G/30G, stabilized at around 20G, then another AppVM is stuck with low memory). It suggests that this is not an open vulnerability into mirage-fw that permits to escape the fw, but still an issue with mirage-fw in respect with Qubes memory management (and it definitely should be fixed at some point) when there is a high memory pressure on the system.

I’ll try to reproduce this issue with the unikernel onboarded inside a template VM (kernel is pvgrub2-pvh, the unikernel boots via multiboot), but this probably won’t solve the issue.

And last, I’ll have to understand what is expected from Qubes to not disturb the memory management process like that, maybe a Qubes dev team member have an idea about that?

1 Like

As expected I reproduced the issue with a multiboot unikernel. As I currently understand the issue:

  • the system is under memory pressure (some AppVM won’t be able to get their highmem value)
  • mirage-fw is excluded from the memory balancing process (but still seems to be implied at some point, didn’t checked without it, but @rrn has done that)
  • starting a new VM is slow because that VM have to use swap due to being at its lowmem value
  • shutting down an AppVM release some memory and permits to increase the memory for other AppVMs

journalctl -f -u qubes-qmemman doesn’t gives me many informations, I got this morning

Xen free = 17325363471 too small for satisfy assignments! assigned_but_unused=17471681968, domdict={'0': {'memory_current': 4278190080, 'memo...

and later

dom '22' still hold more memory than have assigned (18972307456 > 18531172964)

but that isn’t always the case :frowning:

And maybe a good new now!

With the comment at qubes-core-admin/__init__.py at 8e0de909c1defa5b979aac8f28449cd07cad1720 · QubesOS/qubes-core-admin · GitHub it seems that if we never report free memory we are out of the ballooning process (maybe the check/uncheck property in qubes-manager has to be handled by the AppVM kernel?).

So I tried to remove the memory reporting to Xen to force the unikernel to be not counted as donor. I have no more that issue now.

If anyone want to also try out and confirm it works before I PR it, you can run:

# install opam > 2.1
bash -c "sh <(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)"
# install mirage
opam init
opam install mirage
eval $(opam env)
# clone and compile mirage-fw
git clone https://github.com/palainp/qubes-mirage-firewall
git checkout test-no-mem-report
mirage configure -t xen && make depend && dune build

then copy dist/qubes-firewall.xen to dom0.

2 Likes