Qubes gets very slow after templates updates are installed (on btrfs)

Hi,

I am using Qubes OS since 4.0 on different machines, and it mostly works good enough for me.
A few months ago, I installed it on a “new” computer which looks quite performant compared to what I used before :

  • 64GB RAM
  • 2TB SSD m.2 PCIe (Samsung 980 pro)
  • Intel Xeon e5-1650@3.6GHz

I installed Qubes 4.2 with btrfs as filesystem.

It looks like qubes works fine until I do some changes on any template and restart some of the qubes based on it. Installing the qubes updates can almost freeze my workstation for hours. If I let it alone, it mostly works back fast after it is done with $something i cannot identify. running top or iotop in the dom0 doesn’t show any activity, but the whole system is unresponsive. Sometimes only the windows of some Qubes, sometimes also a terminal window on dom0.

I can also observe quite offen, that shutting down the whole computer stucks for ages (already took also over an hour) on the systemd task : cleaning up storage for stopped qubes

Sometimes it also happens while booting up my computer, so my computer is ready after 1 hour.

I checked the NVMe with a live ubuntu stick, and it seems to be fine, so I was thinking that maybe a Problem in how qubes uses REFLINKs of btrfs might be a problem ?
I reinstalled my QubesOS at work, on a less powerfull machine using btrfs too, but there I cannot reproduce the behaviour.

As a result, I mostly install updates of templates only when I am out of home for a few hours. I am glad to get any advice where to look to improve this situation, or I’ll have at some point to make a full backup and reinstall with LVM to look if it works better, but finding the source of the problem would be better :slight_smile:

Thank you for reading, I am happy with any ideas helping to find out.

Cheers,

Patate

Can you see the CPU and ram (on dom0 and on VM) load at the moments when everything freezes? Via top or otherwise. This will help you understand if the reason is RAM, CPU load or something else, for example io. Also make sure you are not using btrfs compression. I don’t know if this feature is enabled by default, like cow, but in case of a VM update it can heavily load the CPU, trying to compress temporary files and new files. Also, maybe you enabled some script or package for data deduplication?

CPU is mostly doing nothing while system is hanging on.
I increased the ram of dom0 to 8GB to check for any changes, but right now I am experiencing the same behaviour.

This time, I don’t even installed updates, I only switched off some of the App-VMs.
I wanted to test with only a few xfce4-terminal in my “untrusted” App-VM.

  • Task Manager of dom0 shows 4% CPU usage, for 15% ram usage on dom0.
  • Most of CPU usage is the xfce4-taskmanager itselfs.
  • top on dom0 is showing a load of around 14.
  • iotop on dom0 is showing almost no I/O.
  • Spawning new tabs in the terminal emulators work, but starting new commands in it hangs. After some minutes, hanging commands are executed, and I can see a short increase in CPU usage
  • At some points the GUI is getting unresponsive, for some windows, inclusive somes of dom0.

I didn’t customize anything for the btrfs file system. I installed with default options of the qubes 4.2 installer while choosing btrfs (on top of luks).

Anything interesting in the dom0 kernel logs when this is happening? E.g. nvme timeouts

Is Free space on btrfs volume > 10%? Btrfs can work very bad if 90+% used

btrfs has something like 25% free space. I ran a scrub on it and there is no error.
Also no kernel errors showing up when the slowness is happening…

I think I’ll try to reinstall with LVM thin pool when I have time…

1 Like