High IO wait on qube start

Try this:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing --action=upgrade kernel-latest kernel-latest-qubes-vm

Thanks, that’s what I was missing.

Same problem with the new kernels installed.

No interesting results from benchmarking writes to the disk. Going to try installing Qubes to an external drive, and looking up how to do NVMe diagnostics.

With Qubes on an external USB, it had its own high iowait, so I couldn’t tell whether the same issue was there.

I’ve reinstalled Qubes on the internal drive, thought I’d try btrfs instead of the lvm I had previously. So far I have not reproduced the issue over about 10 hard reboots. I’ve seen high iowait on starting sys-usb a couple of times, but it resolved within maybe 30s each time.

Now reproducing reasonably reliably on btrfs after a couple more hard restarts, so that looks like a red herring.

I also disabled the bluetooth systemd service, since I was often seeing normal log messages for bluetooth when the iowait time went down, but no change in the behaviour.

I enabled NVMe tracing and checked the configuration, and found there were transactions timing out at that level, with a 30s timeout. Not all the timeouts result in a journal message, but it doesn’t look like it’s rate-limited (I’ve seen two messages appear together) so I’m not sure why. I’m going to try the hack suggested here and set the timeouts low.

I followed a Framework forum post advising setting nvme.noacpi=1 as a kernel parameter, but still getting the problem. Also found this, which is the closest description of my issue that I’ve found (and unfortunately the support advice is to use a supported distro, suggesting it’s not enough evidence of a hardware issue - so I’m also looking for ideas of how to reproduce this in Ubuntu now).

I wonder if a damaged Wi-Fi M.2 card could cause this somehow. Besides being directly connected to PCI it’s also connected to USB (to provide Bluetooth). Maybe try swapping it out, an AX210 is pretty cheap?

1 Like

That looks like a great shout - I disconnected my AX210 and couldn’t reproduce the problem.

I wondered if it could also account for a previous issue I thought was the DisplayPort module (which I could also see on Ubuntu), but that’s still reproducible with the wifi card disconnected - never mind.

Time to have a look at kernel options for the AX210, see if I can isolate this further.

I might just order a swapout. I’m also trying to work out whether it’s possible/simple to do the same sort of PCI passthrough on Ubuntu without Xen.

If you have some Kapton tape (normal electrical tape is too thick) you could disconnect just the Bluetooth-over-USB part by covering the M.2 card’s USB 2.0 D+ and D- data pins with a narrow and long (to hold on to it when pulling the card out of the socket again) sliver of tape. Disclaimer: I’ve only ever done this with an mPCIe card, M.2 pins are even more annoyingly tiny.

In case you want to try it please double check my shape rotation, but I think they’re the middle two pins on the bottom right corner’s four-pin nub marked “A” in this photo. (No idea if both surrounding GND pins are redundant with the card’s other GND pins and not required for anything besides USB, because covering all four would be a lot easier.)

2 Likes

I’d love to see if this works, but don’t have much confidence working at that scale (it looked okay on screen, but of course the chip’s smaller!) and probably don’t know enough to avoid doing bad things to the circuitry.

I’ve blacklisted the btusb kernel module in the template backing sys-usb. Looks promising, need to test more.

1 Like

I haven’t seen the problem since, so marking this as solved. Thanks everyone.