4.1-rc2 sys-net panic

My 4.1-rc2 system sys-net paniced. It was after a wake-from-suspend, if that makes any difference.

sys-net.log (455.5 KB)

Wow.

Have you tried resetting the PCI wifi card?

# echo 1 > /sys/bus/pci/drivers/pciback/<PCI Device>/d3cold_allowed
# echo 1 > /sys/bus/pci/drivers/pciback/<PCI Device>/reset
# echo 1 > /sys/bus/pci/drivers/pciback/<PCI Device>/rescan

Just to be clear:

  1. You woke up your machine
  2. sys-net was not running
  3. You then started sys-net
  4. You got the kernel panic

Right?

When I suspend my machine, I do not shut down sys-net. You cannot readily stop sys-net, as sys-firewall uses it. Similarly, you cannot readily stop sys-firewall as all sorts of AppVMs use it.

One can use the “kill” command via the Qube Manager app to force a paniced sys-net to go away, ignoring any dependencies.

Anyway, when a system panics, it is not usually possible to get a shell prompt in order to try resetting the WiFi. sys-net had paniced, per the log, but had not reset/reboot, and was still ‘running’, per Qubes.

Well, in that case, none of what I said earlier applies, because you suspended your machine while sys-net was running…

Has it done anything like this before?

As I am not using wifi, I removed the PCI device from the sys-net VM, so I have a work-around. I am just reporting the panic here as a ‘heads up’ in case anyone else has their network go catatonic and sys-net be unresponsive, plus if any devs want to use the info from the sys-net.log console stack trackbacks of the issue (there were a lot of errors logged priot to the panic, just after a resume from suspend).

I’m also willing to add the wifi PCI-device back in order to either further debug this issue or to test/verify any fixes.

1 Like

@alzer89: No.

I’ve had quite a few issues with Broadcom hardware on quite a few machines, and some of them even ended up causing a dom0 kernel panic.

To tell you the truth, I haven’t had a single machine that runs Qubes OS that has been able to wake from suspend without some sort of issue (delay before becoming responsive, display/backlight not fully turning on, wifi card not waking up, keyboard not being recognised). I have been longing for the day when I can close the lid, open it up, and everything just works…

Thankfully, boot and shutdown times aren’t as long as they used to be, so that’s what I’ve been doing as a workaround…

Granted, I run Qubes on some pretty niche hardware, though…

Good on you for reporting it. Every bit helps.

Try to add iwlmvm and iwlwifi to /rw/config/suspend-module-blacklist in sys-net

That is good to know. Its now ‘cat appreciation hour’, but after I perform the necessary adulations (IE: tomorrow) I might add back the PCI dev and try poking at it a bit to see if this would permit the use of the device both for networking and with suspend/resume.

This is an intel wifi. I attach the dom0 lspci.log (63.6 KB) which includes the wifi.

Suspend/resume usually works quite well for me on Intel Nuc 10 and Nuc 7 as well as an HP Z8 workstation. Mainly just the HiDPI issues. But I agree: I would like to identify something for which pretty much everything works out of the box. I’ve never had the faith/courage to try hibernate.

I know. My issues with Broadcom weren’t related to your Intel card. I was just saying :smiley:

You have every right to flex, and apologies in advance if any of my drool got on the floor. DAMN that is a nice machine! :drooling_face:

Intel wifi usually works flawlessly. I’ve got an AX200 in one of my machines, and the worst thing it did was refuse to be detected after suspend/resume, but it was fixed by rebooting sys-net.

Your kernel panic is very curious…

…and isn’t a ThinkPad (yes, they’re nice, but you want a bit of variety, you know…)

This is the key part (near the beginning of the whole trace):

[2021-11-21 16:55:29] [ 8698.725354] kworker/1:1: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0

“page allocation failure” → out of memory. Try increasing memory assigned to sys-net (maybe add 50MB or such). You can also observe top there to see what uses most RAM.

I added the wifi device back to sys-net just to see if I could re-create the panic prior to trying the ‘add more memory’ to see if it was reproducible.
My dmesg is full of other iwlwifi errors and stack traces, but so far, has not panic()ed.

I’ll see if I can find the source for iwlwifi matching what is on sys-net and look for any obvious place where it needs to be fixed so as not to panic after having a few page allocation failures…