[qubes-users] Heads-up (Non-Qubes): thin-provisioned swap via LVM caused machine lockup

Ulrich_Windl · October 16, 2020, 10:01pm

Hi!

Just a note as qubes uses thin-provisioned LVs (but not for swap by default): I could reproducibly cause a Linux kernel freeze when multiple processes started to cause paging to a non-encrypted thinly-provisioned LV (basically backed on SSD, actually two SSDs behind a hardware RAID1 controller).
As the machine had rather huge RAM (>512GB), I had used two swap devices: A smaller first one of size 5GB being a plain partition on the RAID device, and a huge thin-provisioned one with lower priority.

first swap device was being filled, but soon after the second device was used (actually when new blocks were allocated from the pool) the kernel had several pauses that got longer and longer, until eventually nothing more happened for minutes or hours (I had a top running in a PuTTY terminal).

I also could reproduce that the problem did not occur when the swap device had enough blocks allocated, but when I used discard to put them back into the pool, the freeze happened again when new blocks needed to be allocated.

For those being interested, I had filed a bug at kernel.org some days ago...

As PuTTY failed to exchange any data with the host (while PING was still answered), not detecting that the connection was dead, I had also filed a bug report for PuTTY some days ago (PuTTY should detect that the connection is dead). The problem was confirmed, but was considered to be "not common enough" to be taken seriously ("patches welcome").

Regards,
Ulrich