[qubes-users] High dom0 CPU usage by qubesd

Hello,
I have dual-core i7 7500U with disabled hyperthreading. In dom0, I often have total CPU usage in tens of percents (often about 50 %, i.e., about fully utilized single core). When I look at htop in dom0, it is clearly caused by qubesd, which clearly uses the vast majority of CPU during these peaks. Note that these peaks look rather random, I see no relation to any activity. But they are quite frequent.

When looking at the process tree, it has many child processes, probably one for each domU qube. But they utilize near zero CPU.

The column TIME+ confirms my CPU% observation in long term.

I am not sure where to find any relevant log. Maybe journalctl, but I have seen nothing suspicious there.

Do you have any idea about the cause, solution or even a suggestion for debugging?

Regards,
Vít Šesták ‘v6ak’

Hi,
I have some further info. I partially know cause and have a workaround.

There is my investigation. Some minor inaccuracies might be caused by retrospective writing:

  1. I have tried to debug using strace. (Prerequisite: sudo qubes-dom0-install strace) After finding pid of qubesd, I ran:
    sudo strace -s 256 -p PID_OF_QUBESD -o /tmp/qubesd.log

It looks like few seconds is enough to get a reasonable sample, see below.

  1. I ran sort /tmp/qubesd.log | uniq -c | sort -n (one can also add “ -r | head -n 50”).

I have noticed an interesting line that repeats frequently:
sendto(270, “QubesNoSuchPropertyError\0”, 25, 0, NULL, 0) = 25

  1. Look closer:

$ grep --before=5 --after=5 QubesNoSuchPropertyError /tmp/qubesd.log

The output contains many repeated occurrences of this, just with a different VM name. It seems to iterate over all the VMs (even those that are not running):

This is some really nice tracing work. I'm sure it would be appreciated
as an issue in the qubes-issues repository so it can be tracked properly.

While I haven't gone to the same depth, I can confirm that `qubesd`
jumps to ~25% CPU regularly on my (albeit much beefier) system with i3.
This does correlate with qubes-i3status running on my system as well.

As a temporary work around, you could modify the script
(/usr/bin/qubes-i3status:123) to run every minute or longer. This would
have the downside of the clock updating slower, but otherwise should not
be a problem.

Alternatively, if the number of running VMs doesn't interest you, you
could comment out line 113 and modify 122 to suit this.

OK, reported, some optimization attempts included: https://groups.google.com/g/qubes-users/c/uTi3QHuhdy8

Also, I have some temptation to reimplement qubes-i3status as a Python wrapper around the original i3status. We would probably also resolve some other problems. For example, I had to fix reading of the battery status.

BTW, if you have ~25% CPU load, I guess you just have quad-core CPU (or maybe dual-core with hyperthreading).

Regards,
Vít Šesták ‘v6ak’

Hi Vit,

* I have many VMs in my computer.
* I use i3 with qubes-i3status
* The script qubes-i3status calls command qvm-ls --no-spinner --raw-data
--fields NAME,FLAGS quite frequently.
* The command qvm-ls --no-spinner --raw-data --fields NAME,FLAGS seems to
cause high CPU load. Unfortunately, the process that shows the high CPU
usage is qubesd, not qvm-ls.

What can be improved:

a. Don't use qubes-i3status. Problem solved.
b. Optimize qvm-ls. Not sure how hard it is.

This issue is really old (back from at least 3.2) and caused by each qvm-ls line relating to one request to qubesd. Actually it was even worse with 3.2.

It should improve with 4.1 though, see [1].

[1] Qubes 4 Admin API is way too inefficient · Issue #3293 · QubesOS/qubes-issues · GitHub

c. Optimize qubes-i3status. I am not sure about the ideal way of doing
that, but clearly running qvm-ls --no-spinner --raw-data --fields
NAME,FLAGS just to compute the number of running qubes is far from optimal.
One could add --running. And maybe it could have been written without
flags. The script just ignores VMs with the first flag being “0” (maybe in
order to ignore dom0) and the second flag being “r” (probably not needed
with --running).

Filtering might work in the meantime, yes.

BR
David

BTW, I’ve started the reimplementation of qubes-i3status as a Python wrapper around i3status. I am trying to be quite conservative – in the default settings, there should be no visible difference except CPU load, periodic freezes and bug fixes (battery status).

  • Some indicators (battery, load and time) are already present, they just need some adjustments of the format in order to be a drop-in replacement.
  • Disk status was easy to implement. I just need to verify that it can properly handle the change of default pool.
  • Running qubes: I need to study the events deeper…
  • NetVM status – currently, it is disabled and discouraged. I might decide to reimplement this, but I am not 100% sure right now.

Regards,
Vít Šesták ‘v6ak’

Although my implementation is not fully complete, I have decided to share my progress: GitHub - v6ak/qubes-i3status-dir: An attempt to make a more efficient and more configurable drop-in replacement for qubes-i3status

It is available under a WTFPL-like license.

Regards,
Vít Šesták ‘v6ak’