Desktop freezes randomly with the last message from journalctl being about unix_chkpwd

RainyDay · December 25, 2024, 3:22am

Just installed qubes a few days ago. I initially had crash problems (random restarts), but those got solved by removing the nvidia card from the machine. However, I am still having freezing problems. Everything but the mouse freezes up. I can move the mouse, but I can’t click on anything, and since the clock time stops changing, I assume the desktop is just totally frozen.

I have started running with a console with “journalctl -f” on screen at all times. So far, the screen has frozen twice since then and both times, the last two lines in the journalctl logs that I can see are about unix_checkpwd.

dom0 kernel: audit: type=1100 audit…
msg='op=PAM:unix_chkpwd acct=“MYACCOUNT” exe=“/usr/sbin/unix_chkpwd” hostname=? terminal=? res=success

The 2nd to last line is blue and the last line is white. So, neither of them is a red error.

However, in the first freeze, the 3rd and 4th line from the end was about “PAM unable to dlopen” and “PAM adding faulty module”. I fixed this by doing the following in a dom0 console:
sudo authselect select minimal
After that, those red PAM errors stopped showing up, but the chk_pwd logs at the end are the same.

Since I think chk_pwd is also PAM related, it makes me think that there’s something fubar’d about PAM that is freezing the machine, but I haven’t been able to find any reference to this anywhere else.

What should I do about this? Since PAM is about authentication, I don’t want to just randomly reinstall it and mess with it since that seems like a great way to lock myself out or create security issues. But, I can’t keep going with a freezing system.

The about page for the qubes manager says:
Qubes release 4.2.3 (R4.2)

What should I try?

UPDATE: 3rd freeze happened with journalctl running and this time the last entries weren’t chk_pwd. This time, the last thing that happened was a SERVICE_START message for /usr/lib/systemd/systemd and the message “audit: type=1130 audit … msg='unit=systemd-hostnamed comm=“systemd” exe=”/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success".

However, the 4th freeze once again had the unix_chkpwd stuff as the last lines. I left that overnight to see if it would ever unfreeze, but it didn’t.

RainyDay · December 26, 2024, 3:56am

On an impulse I opened a long running youtube video and so far today the system hasn’t frozen. However, I’ve already disabled all the suspend options I could find. In Power Manager I have on the System tab “Never” selected for “When inactive for”. And on the Display tab I have the “Display power management” slider disabled. I just realized that “Blank after” is still alive even with that disabled, so I’ve just now slid that to Never as well. Anyway, I don’t see any other options like that anywhere.

I do want power management enabled eventually, but for now I just want to prove this is the issue. I guess I’ll find out after I close the video and wait and see what happens.

FranklyFlawless · December 26, 2024, 4:01am

What hardware did you install Qubes OS on?

RainyDay · December 26, 2024, 4:22pm

MOTHERBOARD: ASUS X670E-PLUS WIFI
CPU: AMD Ryzen 9
RAM: 64GB

RainyDay · December 26, 2024, 4:26pm

Also, the machine didn’t freeze up last night after I’d disabled screen blanking, However, I just reenabled it and it didn’t freeze up after it blanked once. So, this is proving difficult to nail down.

Is there some sort of verbose logging that might catch whatever is happening?

RainyDay · January 1, 2025, 6:44pm

Well, the problem didn’t recur after that, but sadly I was forced to reinstall, and now I have the same problem again and doing the same things again hasn’t fixed it again.

What other ways are there to debug this?

FranklyFlawless · January 1, 2025, 7:18pm

Perform a memory test.

RainyDay · January 3, 2025, 4:37pm

I created a memtest usb drive and left it running overnight. It did 4 passes and found no errors, so I guess it’s not the RAM.

There is an “ECO” switch on the power supply that I have sometimes been accidentally switching. I don’t know what it actually does, but I’m experimenting with having it on or off and seeing if that helps.

Is there maybe a CPU tester? The entire machine is built new, so I suppose there could be CPU or motherboard issues of some type. I found CPU benchmark tools, but didn’t find one that tested if there was anything wrong with the CPU.

FranklyFlawless · January 3, 2025, 10:42pm

What is the model of the PSU?

RainyDay · January 5, 2025, 12:22am

The PSU is:
FSP Hydro Ti PRO 1000W 80 PLUS TITANIUM FULL MODULAR ATX 3.0 PCIe Gen 5

Flipping the eco switch didn’t seem to make a difference. I suppose I could try removing the AMD Radeon RX 7900XT video card as well to see if that helps since qubes isn’t using it anyway until I figure out GPU passthrough. But, I would like it to work with the video card, so I’d rather just figure it out if that’s the issue.

One thing I figured out was that I can do ctrl-alt-f2 and ctrl-alt-f1 to flip back and forth between a console and the gui. The gui stays frozen, but the console is totally usable. I can login and use journalctl and ps and whatever else. But, even doing “journalctl -x -p 0…7 -r” shows no errors from the time when the gui freezes. I always know the freeze time because the clock is frozen at that moment. But, nothing interesting or consistent seems to be showing up.

I’m thinking about installing a debian installation on another drive in the same system and seeing if it freezes or not.

I really hate that the gui just freezes without any information. Is there some other source of logs I can get for the gui? Or maybe I can do some sort of core dump on a process for the gui that’d show me what it’s up to? I don’t know which ps process to look at though.

FranklyFlawless · January 5, 2025, 1:21am

Either try removing the GPU or use Debian then.