I’m rather desperate… I’m experiencing random freezes with QubesOS several times a day and as you can imagine, this is rather frustrating when you’re relying on a system as daily driver.
In the past weeks I already tried to find ideas online on what to do and how to debug it and I tried to find patterns but somehow couldn’t find anything useful.
(latest Kernel didn’t help; reinstallation with BTRFS didn’t help; freeze still occurs even if only Dom0 is running; no errors showing up in journalctl around the time of the freeze;…)
I really hope someone can help me!
P.S.: I already asked for help here and filed a bug report here but unfortunately didn’t get useful replies yet.
did you read the post ? the issue is quite different from you (just mouse and keyboard in both usb and ps2 freeze, xfce clock still work, no xorg crash)
I was indeed considering this but didn’t go through with it because I assumed 4.0 to be more stable than 4.1.
That’s why I did the Dom0 kernel update to 5.13.6-1 instead in the hope that this might show some improvement. But unfortunately I didn’t see any change there.
Thanks for the suggestion. I have a AMD Ryzen 7 5800X, so not an APU with integrated graphics.
Most of the freezes that I observed were indeed like that. But in the last days I also observed freezes where the clock was also frozen (I’ll keep monitoring this) - I already updated the issue in github accordingly.
In all freezing cases, the PS2 keyboard LED didn’t work anymore.
@wind.gmbh: Which log files did you monitor for observing your issue? So far I mainly looked at journalctl -p 0..4 -x, assuming that such issues would be shown there.
if you not so serious about security, it fine to use, it doesn’t have to much bug (i’m very serious about security so i don’t use it but i can install it without workaround on my computer )
Edit: after some in-depth check about log, only thing i found somewhat related is
Sep 23 16:09:25 dom0 kernel: i8042: PNP: PS/2 appears to have AUX port disabled, if this is incorrect please boot with i8042.nopnp
when i search this, it often come with some problem that make ps2 keyboard and/or mouse not responding
Update: I now noticed that my Qubes 4.0 installation on another SSD doesn’t seem to be affected by those freezes (at least I didn’t get one all day). Both setups are basically identical (BTRFS file system; running on exactly the same PC; …). Only difference seems to be the type of SSD. I’m not aware that anything else is different.
With this SSD I’m experiencing the freezes: Samsung 870 EVO 4TB
With this SSD I didn’t experience the freezes yet: Samsung 860 EVO 2TB
With this SSD I’m experiencing the freezes: Samsung 870 EVO 4TB
I am using a Samsung 870 QVO 2TB SSD. Turns out I should have been doing some reading instead of just taking one of the BestBuy shelf. Looks like I bought cheap crap and have no reason to complain.
Will get a better SSD not only for this issue but the 1,000 write cycles scare the crap out of me
As for the debugging question:
Debugging hard drive issues are usually a problem because when the error message is generated the log no longer gets written to the disk.
The solution I normally use is to forward a copy of the logs over the network to a external system, so that after the crash I can see the error message (and hopefully get useful debugging information)
While incomplete, the beginning of talking about logging is here:
In your case you would need to log dom0, which has not gotten addressed yet in the document, but it appears rsyslog is installed in dom0, so presumably you could just add the file to /etc/rsyslog.d/ on dom0 and it should get to the logging qube.
Unfortunately the part on forwarding the logs to a external system are not complete yet
Sorry @qpost135, my answer had nothing to do with your thread. I just saw BTRFS and thought this was a reply to another thread I am involved with. That thread is about Qubes OS being unresponsive for a few seconds when shutting down a large HVM.
@Plexus: I mixed up several threads and topics and have in the process probably caused some confusion. Let me try to clean up after myself:
Actual “freezes” as in: the entire system froze (incl. mouse cursor / dom0) and never became responsive again. That I have seen on the T430 with i7-3740QM and i7-3840QM but only when using kernel version 5.4.x … never with v4.19.x – there was nothing in the logs of any value. It appears the computer simply stopped and did nothing until it was hard rebooted. I can no longer reproduce this with the latest 5.4 kernel on the T430.
A dramatic slow-down when shutting down my 100GB Windows HVM (corporate install). Here the mouse and switching desktops still works, but not much else. Even keyboard input slows down and comes to a halt eventually … for a few seconds. Then I get the notification that the qube shut down and everything goes back to normal. In that context someone recommended trying BTRFS. I saw this behavior on both the P51 and the T430 and thought it’s related to ‘trim’ on SSDs.
… I saw @qpost135 post and because it mentioned BTRFS I made the mental jump to the second topic, even thought this thread is related to the first. So @qpost135: you might want to give kernel 4.19 a try and see if that changes anything for you.
So far, my theory seems to be valid. I only experienced one freeze after several VM upgrades and several changes. Apart from that, my system seems to be stable now. What a relief…
I currently assume that the issue is indeed caused by that SSD. At some point I might try to run a diagnosis on it with a Samsung tool or see if the controller on the SSD can be updated.
Sorry for the freezing. I’m not sure if my experience can help you but just something you may want to consider. I too had freezing on AMD. Then I went the route of getting an x230 i7 following the certification specs. That system just works™. Except the large data problem I described in the other thread which hopefuly btrfs will fix. Also that data problem seems to be a qubes problem not a hardware problem. I also bought a second x230 i7 for emergency migrations should my main system dies. Their so cheap buying them used if you hunt around.
I have 16GB ram. If that’s low for you, you gotta make use of minimal templates and lower the default assigned ram on many of your vms. Like if you use many disp-whonix vms simulatanously. there’s likely no reason to have that set to 4GB per disp. Lower it down to 1GB or even less if that’s all you need depending on your browsing situation per disp.
Thanks for that suggestion. As far as I currently understand, the root cause of my problem is the controller on the SSD, which somehow messes up this TRIM feature. So, I would assume that another processor would not help with that. But thanks anyway for that suggestion!