emmi
May 7, 2022, 10:37pm
1
Hi,
since about a week ago, my whonix gateways start swapping heavily after resuming from suspend. Does anyone have similar issues, and maybe good ways to investigate this?
Thank you!
Iāve noticed my sys-whonix become unresponsive after resuming from suspend, have to qvm-kill sys-whonix
. Otherwise the next suspend fails.
1 Like
I had the same issue and itās also described here:
opened 05:09PM - 01 Apr 22 UTC
T: bug
C: Xen
P: major
r4.0-dom0-cur-test
r4.1-dom0-stable
diagnosed
waiting for upstream
[How to file a helpful issue](https://www.qubes-os.org/doc/issue-tracking/)
#ā¦ ## Qubes OS release
R4.1
### Brief summary
The Xen hypervisor has performance problems on certain compute-intensive workloads
### Steps to reproduce
See @fepitre for details
### Expected behavior
Same (or almost same) performance as bare hardware
### Actual behavior
Less performance than bare hardware
Fixed it by applying this patch:
opened 05:09PM - 01 Apr 22 UTC
T: bug
C: Xen
P: major
r4.0-dom0-cur-test
r4.1-dom0-stable
diagnosed
waiting for upstream
[How to file a helpful issue](https://www.qubes-os.org/doc/issue-tracking/)
#ā¦ ## Qubes OS release
R4.1
### Brief summary
The Xen hypervisor has performance problems on certain compute-intensive workloads
### Steps to reproduce
See @fepitre for details
### Expected behavior
Same (or almost same) performance as bare hardware
### Actual behavior
Less performance than bare hardware
3 Likes
I also experience the same problem, and itās not just whonix-gw based templates like sys-whonix but also whonix-ws based ones.
Also systemd-socket-proxyd
jumps to 100% cpu usage for me after a while and I have to kill it each time. All whonix based templates become unresponsive after suspend and shutting them down makes the Qube Manager unresponsive until it throws an error: I reported the latter on Github: Error when shutting down whonix templates and Qube Manager becomes unresponsive (Failed to shutdown domain '18' with libxenlight) Ā· Issue #7510 Ā· QubesOS/qubes-issues Ā· GitHub
1 Like
tzwcfq
May 16, 2022, 11:30am
5
You can try to change this file in dom0:
/usr/lib/python3.8/site-packages/qubes/vm/qubesvm.py
According to the patch that Iāve linked above and see if itāll fix your issue.
1 Like
Thanks, that did solve it for me!
I will see if it also solves the issue with systemd-socket-proxyd
jumping to a 100% CPU usage.
Edit: Unfortunately the systemd-socket-proxyd
issue is still present.
Unfortunately there still is a problem even after applying the latest patch mentioned in: Xen-related performance problems Ā· Issue #7404 Ā· QubesOS/qubes-issues Ā· GitHub
After waking up from suspend, all whonix windows are frozen in this sense: for instance with Tor Browser if I use keyboard shortcuts to move from one tab to another the title of the window changes but thatās about it, nothing else changes, the UI is frozen. I can open applications further but theyāre all frozen, they accept and interact with keyboard input but graphically nothing changes aside from the first initial graphical display. And this doesnāt apply to whonix only, it also happened with a debian-11 based template but only firefox was affected, and it stayed affected even after restarting it. I donāt know what could be the cause behind this or how I can even debug it given that I canāt get any graphical update.
Does this happen every time after waking up from suspend?
It happens to at least one VM everytime, especially disposable whonix-ws ones. Most of the time some VMs are spared while others arenāt.
tzwcfq
May 20, 2022, 5:50pm
10
Does it affect only browsers or terminals / file managers / etc as well?
When you applied patch at first it didnāt happen for some time?
Yes, also I noticed that it mostly affected xfce apps and it didnāt affect for example the gnome file manager nautilus
.
Yes.
tzwcfq
May 20, 2022, 6:09pm
12
resulin:
Yes.
Maybe youāve updated dom0 and the patched file was overwritten with updated package file without patch?
Did you check if the patch is still there?
1 Like
Youāre right! That did not cross my mind, thanks a lot for the help! Hopefully no issue will pop up after.
1 Like
Unfortunately the issue reappears sometimes with a disposable whonix-ws VM after suspend.
Also sometimes the whonix-gwās CPU usage goes through the roof for no apparent reason and then it goes back to normal.
I made sure this time that the patch wasnāt overwritten.
tzwcfq
May 28, 2022, 3:38am
15
It didnāt happen for me anymore after applying the patch. Maybe there is some relation to the hardware that can trigger this issue. Or maybe itās a different issue. I think itād be better to reopen the github issue for this.
1 Like
@tzwcfq
I think I finally nailed it by finding a way to reliably produce this problem: Open some VMs (not necessarily whonix) then pause them and click on Suspend. After waking up from it the problem that I explained in Suspend, swap, and whonix-gateway performance issues - #7 by resulin appears. Iām interested to see if anyone else can reproduce it. (Note: The earlier patch was enabled)
1 Like
tzwcfq
June 5, 2022, 4:08pm
17
Yes, I have the same issue if I pause sys-whonix then enter suspend. When I resume from suspend the paused sys-whonix will be unpaused but partly unresponsive.
I guess itās a bug with resume from suspend procedure that doesnāt take into account that some VMs were paused.
I guess you can reopen the github issue with these details.
Issue happening inside Qubes-Whonix but root cause is not caused by Qubes-Whonix.
Thank you for attempting and succeeding of reproducing this issue independent of Whonix and Qubes bug reporting!
Also discussed here:
Hello there, Qubes 4.1. sdwdate: 3:19.7-1 With all updates installed under whonix-gw-16, when the system wakes up from suspend, sys-whonix keeps a whole core at 100% CPU usage and Terminal opened prior of going into suspend are unresponsive....
Reading time: 24 mins š
Likes: 10 ā¤
@tlaurion reported:
opened 09:21PM - 10 Jun 22 UTC
closed 06:02PM - 16 Jun 22 UTC
r4.1-dom0-cur-test
Update of linux-kernel to v5.10.121-1 for Qubes r4.1, see comments below for detā¦ ails.
Built from: https://github.com/QubesOS/qubes-linux-kernel/commit/c5104766963d4bdbaa9ec329a521a51bc41be289
[Changes since previous version](https://github.com/QubesOS/qubes-linux-kernel/compare/v5.10.112-1...v5.10.121-1):
QubesOS/qubes-linux-kernel@c510476 version 5.10.121-1
QubesOS/qubes-linux-kernel@a18f839 Merge remote-tracking branch 'origin/pr/594' into stable-5.10
QubesOS/qubes-linux-kernel@3a84c17 Revert "Switch default clocksource to 'tsc'"
QubesOS/qubes-linux-kernel@ac59b28 Update to kernel-5.10.119
QubesOS/qubes-linux-kernel@10df8ee ci: use default tags
Referenced issues:
QubesOS/qubes-issues#7404
If you're release manager, you can issue GPG-inline signed command:
* `Upload linux-kernel c5104766963d4bdbaa9ec329a521a51bc41be289 r4.1 current repo` (available 7 days from now)
* `Upload linux-kernel c5104766963d4bdbaa9ec329a521a51bc41be289 r4.1 current (dists) repo`, you can choose subset of distributions, like `vm-fc24 vm-fc25` (available 7 days from now)
* `Upload linux-kernel c5104766963d4bdbaa9ec329a521a51bc41be289 r4.1 security-testing repo`
Above commands will work only if packages in current-testing repository were built from given commit (i.e. no new version superseded it).
QubesOS:master
ā marmarek:suspend-all
opened 03:06PM - 03 May 22 UTC
Just pausing a VM for a host suspend breaks 'tsc' clocksource.
QubesOS/qubes-ā¦ issues#7404
QubesOS/qubes-issues#2044
TODO:
- [x] test with mirage PVH
- [x] suspend (instead of pause) stubdomain too
- [x] Fix QWT to not remove xenstore entry (https://github.com/QubesOS/qubes-issues/issues/7404#issuecomment-1117611495)
- [x] https://github.com/QubesOS/qubes-issues/issues/4657
Commit which introduced the issue:
committed 10:31AM - 22 Apr 22 UTC
The 'tsc' clocksource is significantly faster than 'xen', and since
Qubes OS doeā¦ s not support VM migration, it is safe to use.
In synthetic benchmarks, it improves performance over tenfold, but
significant improvements in real world use cases is visible too.
Fixes QubesOS/qubes-issues#7404
(cherry picked from commit 777f65244ae32ce15f7198f3292d75eb5184df30)
Which happened in context of:
opened 05:09PM - 01 Apr 22 UTC
T: bug
C: Xen
P: major
r4.0-dom0-cur-test
r4.1-dom0-stable
diagnosed
pr submitted
r4.2-host-stable
affects-4.1
[How to file a helpful issue](https://www.qubes-os.org/doc/issue-tracking/)
#ā¦ ## Qubes OS release
R4.1
### Brief summary
The Xen hypervisor has performance problems on certain compute-intensive workloads
### Steps to reproduce
See @fepitre for details
### Expected behavior
Same (or almost same) performance as bare hardware
### Actual behavior
Less performance than bare hardware
Commit reported to fix this:
committed 07:30PM - 10 Jun 22 UTC
This causes issues with Whonix on resume.
Report from @tlaurion:
https://forums.ā¦ whonix.org/t/sys-whonix-sdwdate-not-properly-resuming-from-suspend/13759/4
And from @jevank:
https://github.com/QubesOS/qubes-issues/issues/7404#issuecomment-1113299850
This reverts commit d0a6fe5c5b7b0e4191f7c5d463adfcba9862242f.
Which also refers to: https://github.com/QubesOS/qubes-issues/issues/7404#issuecomment-1113299850
While weāre at itā¦ More on clocksource tsc vs clocksource xen:
https://phabricator.whonix.org/T389
Not sure thatās a different Qubes bug causing this. Could be the following:
opened 10:40PM - 01 Jun 22 UTC
closed 02:23AM - 20 Jul 22 UTC
T: bug
C: kernel
P: default
r4.1-buster-stable
r4.1-bullseye-stable
r4.1-dom0-stable
diagnosed
r4.1-fc34-stable
r4.1-centos-stream8-stable
r4.1-fc35-stable
r4.1-bookworm-stable
r4.1-fc36-stable
affects-4.1
### Qubes OS release
4.1
### Brief summary
Usually i keep Qubes runningā¦ for several days and i turn it off on weekends, Im discovering that longer the VM running the more logs getting duplication messages.
### Steps to reproduce
- Open any VM and keep it open for several days (preferable to keep an app opened as well like thunderbird or hexchat ..etc)
- Open Terminal and write:
`sudo journalctl -b `
- See the output
### Expected behavior
No duplication to this level at least.
### Actual behavior
E.g log:
```
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x735 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x149 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x1bb4 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x12e still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x1980 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x17a1 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0xc1 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x18a8 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x18b3 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x14c still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x1c21 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x198c still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0xdc still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x720 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x15d still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x189f still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x177e still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x1d3c still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x1881 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x732 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x207 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x17c7 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x1cc1 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x121 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x174a still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x1780 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x162 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x1833 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x19e3 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x18a0 still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x179d still pending
May 31 15:41:09 host kernel: xen:grant_table: g.e. 0x730 still pending
```
Full log:
[journalctl-b.txt](https://github.com/QubesOS/qubes-issues/files/8819146/journalctl-b.txt)
There the symptom was:
systemd-socket-proxyd
Failed to get remote socket: Too many open files
4 Likes