Random system freezes after 4.1 clean install

I have downgraded my kernel after the last freeze yesterday from 5.15.57-1.fc32.qubes.x86_64 to Linux 5.15.52-1.fc32.qubes.x86_64 and so far no issues.

this is my current specs

---
layout:
  'hcl'
type:
  'notebook
docking station'
hvm:
  'yes'
iommu:
  'yes'
slat:
  'yes'
tpm:
  'unknown'
remap:
  'yes'
brand: |
  Dell Inc.
model: |
  Vostro 3459
bios: |
  1.4.1
cpu: |
  Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
cpu-short: |
  FIXME
chipset: |
  Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [8086:1904] (rev 08)
chipset-short: |
  FIXME
gpu: |
  Intel Corporation Skylake GT2 [HD Graphics 520] [8086:1916] (rev 07) (prog-if 00 [VGA controller])
gpu-short: |
  FIXME
network: |
  Intel Corporation Wireless 3160 (rev 83)
  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
memory: |
  16275
scsi: |
  Samsung SSD 860  Rev: 4B6Q
usb: |
  1
versions:

- works:
    'FIXME:yes|no|partial'
  qubes: |
    R4.1
  xen: |
    4.14.5
  kernel: |
    5.15.52-1
  remark: |
    FIXME
  credit: |
    FIXAUTHOR
  link: |
    FIXLINK

---

Very nice hardware specs, by the way. Im a little jealous. That baby must purr nicely :wink:

Interesting….

That’s great that the previous kernel doesn’t have the same issues, but it’s very curious as to why.

——-

Any chance you can edit your posts with the logs in them and put “```” on its own line at the top and bottom of the journal log? :slightly_smiling_face:

I’m not being picky. It’s just that not doing this will cause Markdown to potentially think some symbols are formatting symbols, and may hide some valuable information.

It’ll also make finding a solution for you much quicker.

Thanks in advance!

—-

The xenstore daemon and the Qubes memory management daemon seem to be the culprit, so these logs are definitely a good start :slight_smile:

Thanks for the “```” tip! I wondered how I could do that earlier.

The xenstore daemon and the Qubes memory management daemon seem to be the culprit, so these logs are definitely a good start :slight_smile:

Is there anything i can do?

I have the same problem after the last dom0 update. I will attempt downgrading kernel as well.

I have a problem too after the two last dom0 updates. Don’t know if it’s the same problem — here’s mine:

The upgrade from kernel 5.15.81-1 to 5.15.89-1 made disks in the VMs become “read-only” shortly after reboot (within some minute/s), there were errors like: Error: Read-only file system, and applications started crashing, and VMs too and dom0 tried to write a core dump (didn’t work, read-only).

Reverting to kernel 5.15.81 immediately solved this. Now, though, after the upgrade to 5.15.94-1 the problem is back, and reverting to 81 no longer works (problem still happens).

If I type journalctl in dom0 I see this error hundreds of times:

device-mapper: thin: dm_thin_find_block failed error=-5

Edit: Those messages are gone after reboot, … Indicating that it wasn’t possible to write them to disk? — This returns no rows:

journalctl | grep dm_thin_find_block

(Anyone has any ideas?)

@qubesISNOTstable

I will attempt downgrading kernel as well.

Did it work, what happened for you? — Downgrading from which version, to which version?

I still didn’t do it, but today I got two OS freezes (had to manually shutdown the PC - loosing all the work in the vms!!!).
Now I see new dom0 updates, perhaps will it fix the bug since this appeared in the last update (probably not).
I’d also like to see the logs for dom0 upgrades so that I can better identify what changed.

This is in all extremely frustrating, why am I having to debug Critical level bugs alone on a relatively clean Qubes installation that was previously prefectly working (besides occasional random reboots althgouh rare)??? :rage:

1 Like

Hi, I had OS freezes at some point and it turned out to be a hardware error. The laptop even froze with a different OS than Qubes. It could be faulty RAM or something similar.

1 Like

@qubesISNOTstable

Seems I got my SSD working again — I’ve created a separate post about that:
Solving (?) one type of frozen disk problem

why am I having to debug Critical level bugs alone on a relatively clean Qubes installation

That’s annoying indeed. At the same time, I think this is not because of Qubes, instead, it’s the Linux kernel, and had you been using CentOS or Debian, you’d run into the same problems, once you upgraded to the problematic kernel. — From what I’ve read, these problems happen more often with Linux than with Windows. I suppose there’s more people getting paid for & working with fixing Windows laptop problems, than Linux laptop problems.

Anyway after having seen WiFi break (that was in Ubuntu), or problems with external monitors, and now a SSD, all this following kernel upgrades (minor patches), I’m not going to recommend a Linux laptop to anyone unless they have specific needs (e.g. Qubes & security).

@leo

turned out to be a hardware error

Ok, seems as if to find out & verify that it was indeed the hardware, you had to wipe the disk and install another OS?
  — I hope it’ll take long until I’ll see an actual hardware error and have to do that :- /

1 Like

Yeah, I found out it was a hardware issue when it kept happening even after having installed another OS on the laptop – although the laptop was new though so it wasn’t a big deal. It was still under warranty so I just had to send it back and they fixed it.

Moreover, to troubleshoot hardware issues, it is possible to boot an OS directly from a USB stick without wiping the computer, and make hardware test from there. There are live USB image specifically made to test for hardware errors, such as MediCat USB for example.

@KajMagnus

I think this is not because of Qubes, instead, it’s the Linux kernel

I’ve been switching to the previous kernel version on boot the last two days, from the current (default) 5.15.94-1 to 5.15.89-1 (both 64bits), so far I noticed the OS is running greatly, and no reboots or freezes, I’ll switch to make the default the .89 and see if I still get no problems - will update the thread in case there are.

@leo Ok, good that the problem happened early then :- )

MediCat looks interesting! At the same time, I wonder if these things could theoretically install a UEFI rootkit, hmm. But it’s a popular open source project so I guess it’s reasonably safe to use hmm.

Can a USB stick that’s being booted from, automatically update BIOS / UEFI? Or I’d need to boot into UEFI during POST (power-on, self test) — rather than booting the OS on the USB — to update UEFI from inside UEFI’s own menus and buttons?

@qubesISNOTstable

Interesting that 5.15.89 works for you, in my case that version instead introduced the problem :- )
I suppose we’re using different laptops & SSD brands

(I wonder if it matters, in cases like this, what kernel versions the VMs (e.g. AppVMs) use. I guess not (?))

Yes I suppose it could install a rootkit. :slight_smile:

I think there are different ways to update UEFI depending on the device. Many constructors offer a way to do it from inside UEFI’s own menus, and for some devices it can also can be updated from Linux.

1 Like

I can confirm, switching back from the new 5.15.94-1 to the old 5.15.89-1 permanently fixed the problem.
To note that I still keep using the new kernel in the VMs, just the old one on Dom0.

I just updated dom0 yesterday, and today while under heavy stress (lots of vm opens, intese usage) the freeze effect appeared again and I had to kill through the physical pc button. And this happened with the reverted kernel.

QUBES IS A SHITTY OS, where are all the funds going?!

I want to reiterate the above that this could be a hardware issue. Qubes OS and other resource intensive tasks tend to trigger hardware faults.

Have you run MemTest86 or tried different RAM configurations (e.g., if you have two sticks of RAM, remove one, then see if the system is stable or buggy)?

Triaging other components is more difficult.

If you have random freezes after installing with the default configuration (especially under heavy I/O loads, e.g. during template installation or restoring VMs), run sudo swapoff -a in dom0 to temporarily disable swap and see what happens.

The swap on LVM-partition on LUKS (Qubes OS uses this configuration by default) has a bug that causes random system freeze (google “lvm luks swap freeze” to see people’s confusion).

Unfortunately, as far as I have tried, even R4.2-alpha with kernel-latest 6.1.12 faced that problem. But after configuring it to not use swap, it is quite stable.

I don’t believe that disabling swap is the silver bullet that fixes all cases of system freezes, but I hope it helps someone.

2 Likes

Thank you both @sm95 and @kommuni for the inputs.
I have recently identified closely the trigger for the freeze, which I just explained on another thread and will paste here:

I have the formula to replicate the bug more or less, I just need to very heavily stress my hardware (lots of open vms like 14 or 15, doing specific resource intensive tasks), then I have to open or interact with my KeePass2 instance in the vault vm, right then and there the OS freezes with the bug I described at the beginning.
If I do not interact with KeePass2 this doesn’t happen no matter what I’m doing.
I will soon change from KeePass2 to another password manager and see if the problem persists.

It might be that the encryption of KeePass has something to do with this, it might over stress the cpu or something.

I have 64gb of DDR5 but at this point even if it is hardware it’s definitely not a random freeze, and I can replicate the issue, the good thing is that now that I know the trigger, I can avoid it.

1 Like

Hi @leo thanks for the link to updating UEFI / BIOS from Linux, didn’t know it was (sometimes) possible, quite interesting (thinking about rootkits in general).