I had mostly fixed this issue a long time ago by upping the RAM in dom0, but an update seems to have brought it back
Basically, my Qubes PC can’t run for longer than a couple days without randomly crashing. When it crashes, it stops responding to all user input, however the machine is still working under the hood because all network services it’s providing continue to work until I manually reboot.
Sometimes it crashes when I’m using it, sometimes it’s already dead when I come to use it. Depending on whether or not the screen saver was on when it crashes, it’ll either stay with a black screen or the last rendered frame.
When I upped the RAM in dom0 the issue got much better and I only had to deal with it every few weeks, but now it’s literally every 1-2 days. Can somebody help me with this? I don’t know if it’s a software or hardware issue at this point.
Note that I don’t use sys-usb, so all USB is in dom0.
edit: another interesting detail is when this happens, the keyboard lights (caps/scroll/num lock) don’t respond either
edit 2: as a last resort I could use a null-modem cable to pipe the output of journalctl to a different terminal… but perhaps you folks have a more 21st-century solution to checking the logs of a system that crashes
No. Keyboard, mouse, and screen are all completely frozen.
I can’t, without rebooting the system, and if I do that then the log will be flooded with stuff from the most recent boot. Is there a way to check specifically what happened the previous boot?
I did that and scrolled down to the bottom, didn’t see anything out of the ordinary, just more of the same stuff it had been spitting for hours.
Assuming the log timestamps are correct, the last logged message happened 25 minutes before I rebooted the machine just now, which is long after the machine entered the crashed state. I have a vague time frame of when it went down so I checked those areas in the log too.
Throughout the entire log there are instances of the following errors, but I don’t think they’re related because they show up all the time in the log:
PAM unable to dlopen(/usr/lib64/security/pam_sss.so) …
Aug 12 12:40:30 dom0 kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
Aug 12 12:40:30 dom0 kernel: i915 0000:00:02.0: [drm] Xorg[5133] context reset due to GPU hang
Aug 12 12:40:30 dom0 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85dffffb, in Xorg [5133]
Hmm, I didn’t see anything like that. I have a GeForce though.
I went ahead and stripped the machine down because it had some old hardware I wasn’t using anymore. Hopefully it was some of that causing the issue. If it’s not that and the PS/2 keyboard doesn’t save me and you guys don’t have any other theories then it may be time for me to buy new qubes pc tbh this smells of hardware issue more than anything I’ve ever seen in software, specially in Linux. now that it doesn’t crash again
Welp it crashed again this is definitely more frequent than last time. I had it on a lighter load than usual too
Does anybody know if the fact that the machine keeps running in the background is enough to rule out a RAM issue? I’m tempted to play trial and error with the RAM sticks but how could it possibly be the RAM if everything except user i/o works fine?
I’ve ran memtest overnight a couple times since this issue started. It never reported any issues.
On another note; the strangest thing just happened. I just went to the bathroom and when I came back, I noticed the mouse pointer had returned on what was before a completely black screen. And it moves! Both keyboards are still dead, though (with caps lock stuck on, interestingly). I can only move the mouse. At this point I would expect the screensaver password prompt to appear, but it doesn’t. Now I’m even more confused For as long as I’ve had this issue, I hadn’t seen the mouse resuscitate until now.
Try to disconnect and connect back your keyboard.
Try to switch to another TTY with Ctrl+Alt+F2.
If it still won’t work then reboot the system and check the dom0 log to see if the keyboard disconnect/connect events were logged.
I walked out again (to make sure I give it enough time to save the logging of the keyboard reconnection) and when I came back, there was an actual dom0 login prompt. I did the Ctrl+Alt+F2 thing but it had no effect immediately. I guess I just had to give it a moment? I seem to have lost my desktop, but I’m logged in on a tty now.
WHAT THE HELL IS GOING ON?!?! I was getting ready to re-seat hardware but now with this development, this has got to be a software problem
Yes, seems to be a GPU issue.
Maybe you can try to install the kernel-latest to see if it’s fixed in newer kernels.
Or use proprietary NVIDIA drivers instead of nouveau.
I’m allergic to proprietary I’ll try the new kernel next. If that doesn’t work then at least I know what I need to replace. Since Qubes is CPU rendered, I can get by with a cheap AMD card off Amazon. Thanks a lot for walking me thru this!
Funny that of all my computers this is the one that doesn’t have integrated graphics to fall back on my pristine luck shines again
Amazon too slow lol, I found a RX580 locally for cheap instead. Which makes this my first full red machine
Qubes booted with new GPU and all seems well so far. (Except they screwed me over with a DVI port that looks like a DVI port but isn’t a DVI port, but I’ll let it slide since I’m desperate and just want to use the damn computer at this point.) If it still wants to crash after this then my next thread will be “semi working Qubes computer giveaway thread”