4.1-rc2 sys-net panic

My 4.1-rc2 system sys-net paniced. It was after a wake-from-suspend, if that makes any difference.

sys-net.log (455.5 KB)

Wow.

Have you tried resetting the PCI wifi card?

# echo 1 > /sys/bus/pci/drivers/pciback/<PCI Device>/d3cold_allowed
# echo 1 > /sys/bus/pci/drivers/pciback/<PCI Device>/reset
# echo 1 > /sys/bus/pci/drivers/pciback/<PCI Device>/rescan

Just to be clear:

  1. You woke up your machine
  2. sys-net was not running
  3. You then started sys-net
  4. You got the kernel panic

Right?

When I suspend my machine, I do not shut down sys-net. You cannot readily stop sys-net, as sys-firewall uses it. Similarly, you cannot readily stop sys-firewall as all sorts of AppVMs use it.

One can use the ā€œkillā€ command via the Qube Manager app to force a paniced sys-net to go away, ignoring any dependencies.

Anyway, when a system panics, it is not usually possible to get a shell prompt in order to try resetting the WiFi. sys-net had paniced, per the log, but had not reset/reboot, and was still ā€˜runningā€™, per Qubes.

Well, in that case, none of what I said earlier applies, because you suspended your machine while sys-net was runningā€¦

Has it done anything like this before?

As I am not using wifi, I removed the PCI device from the sys-net VM, so I have a work-around. I am just reporting the panic here as a ā€˜heads upā€™ in case anyone else has their network go catatonic and sys-net be unresponsive, plus if any devs want to use the info from the sys-net.log console stack trackbacks of the issue (there were a lot of errors logged priot to the panic, just after a resume from suspend).

Iā€™m also willing to add the wifi PCI-device back in order to either further debug this issue or to test/verify any fixes.

1 Like

@alzer89: No.

Iā€™ve had quite a few issues with Broadcom hardware on quite a few machines, and some of them even ended up causing a dom0 kernel panic.

To tell you the truth, I havenā€™t had a single machine that runs Qubes OS that has been able to wake from suspend without some sort of issue (delay before becoming responsive, display/backlight not fully turning on, wifi card not waking up, keyboard not being recognised). I have been longing for the day when I can close the lid, open it up, and everything just worksā€¦

Thankfully, boot and shutdown times arenā€™t as long as they used to be, so thatā€™s what Iā€™ve been doing as a workaroundā€¦

Granted, I run Qubes on some pretty niche hardware, thoughā€¦

Good on you for reporting it. Every bit helps.

Try to add iwlmvm and iwlwifi to /rw/config/suspend-module-blacklist in sys-net

That is good to know. Its now ā€˜cat appreciation hourā€™, but after I perform the necessary adulations (IE: tomorrow) I might add back the PCI dev and try poking at it a bit to see if this would permit the use of the device both for networking and with suspend/resume.

This is an intel wifi. I attach the dom0 lspci.log (63.6 KB) which includes the wifi.

Suspend/resume usually works quite well for me on Intel Nuc 10 and Nuc 7 as well as an HP Z8 workstation. Mainly just the HiDPI issues. But I agree: I would like to identify something for which pretty much everything works out of the box. Iā€™ve never had the faith/courage to try hibernate.

I know. My issues with Broadcom werenā€™t related to your Intel card. I was just saying :smiley:

You have every right to flex, and apologies in advance if any of my drool got on the floor. DAMN that is a nice machine! :drooling_face:

Intel wifi usually works flawlessly. Iā€™ve got an AX200 in one of my machines, and the worst thing it did was refuse to be detected after suspend/resume, but it was fixed by rebooting sys-net.

Your kernel panic is very curiousā€¦

ā€¦and isnā€™t a ThinkPad (yes, theyā€™re nice, but you want a bit of variety, you knowā€¦)

This is the key part (near the beginning of the whole trace):

[2021-11-21 16:55:29] [ 8698.725354] kworker/1:1: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0

ā€œpage allocation failureā€ ā†’ out of memory. Try increasing memory assigned to sys-net (maybe add 50MB or such). You can also observe top there to see what uses most RAM.

I added the wifi device back to sys-net just to see if I could re-create the panic prior to trying the ā€˜add more memoryā€™ to see if it was reproducible.
My dmesg is full of other iwlwifi errors and stack traces, but so far, has not panic()ed.

Iā€™ll see if I can find the source for iwlwifi matching what is on sys-net and look for any obvious place where it needs to be fixed so as not to panic after having a few page allocation failuresā€¦

Happened again.

[2022-02-02 11:56:35] [64301.310535] iwlwifi 0000:00:06.0: api flags index 2 larger than supported by driver
[2022-02-02 11:56:35] [64301.310565] iwlwifi 0000:00:06.0: TLV_FW_FSEQ_VERSION: FSEQ Version: 65.3.35.22
[2022-02-02 11:56:35] [64301.310842] iwlwifi 0000:00:06.0: loaded firmware version 59.601f3a66.0 QuZ-a0-hr-b0-59.ucode op_mode iwlmvm
[2022-02-02 11:56:35] [64301.310914] iwlwifi 0000:00:06.0: Detected Intel(R) Wi-Fi 6 AX201 160MHz, REV=0x354
[2022-02-02 11:56:35] [64301.326599] kworker/1:1: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
[2022-02-02 11:56:35] [64301.326633] CPU: 1 PID: 8049 Comm: kworker/1:1 Not tainted 5.10.76-1.fc32.qubes.x86_64 #1
[2022-02-02 11:56:35] [64301.326649] Hardware name: Xen HVM domU, BIOS 4.14.3 11/11/2021
[2022-02-02 11:56:35] [64301.326667] Workqueue: events request_firmware_work_func
[2022-02-02 11:56:35] [64301.326679] Call Trace:
[2022-02-02 11:56:35] [64301.326693]  dump_stack+0x6b/0x83
[2022-02-02 11:56:35] [64301.326704]  warn_alloc.cold+0x7b/0xdf
[2022-02-02 11:56:35] [64301.326714]  ? __alloc_pages_direct_compact+0x171/0x180
[2022-02-02 11:56:35] [64301.326726]  __alloc_pages_slowpath.constprop.0+0xa13/0xa50
[2022-02-02 11:56:35] [64301.326737]  ? get_page_from_freelist+0x1ac/0x370
[2022-02-02 11:56:35] [64301.326748]  __alloc_pages_nodemask+0x34c/0x380
[2022-02-02 11:56:35] [64301.326760]  kmalloc_order+0x28/0x100
[2022-02-02 11:56:35] [64301.326769]  kmalloc_order_trace+0x19/0x80
[2022-02-02 11:56:35] [64301.326779]  __kmalloc+0x227/0x260
[2022-02-02 11:56:35] [64301.326796]  iwl_pcie_rx_alloc+0x6b/0x1e0 [iwlwifi]
[2022-02-02 11:56:35] [64301.326811]  _iwl_pcie_rx_init+0x331/0x350 [iwlwifi]
[2022-02-02 11:56:35] [64301.326827]  ? iwl_mvm_nic_config+0x124/0x1e0 [iwlmvm]
[2022-02-02 11:56:35] [64301.326856]  iwl_pcie_gen2_nic_init+0x69/0xe0 [iwlwifi]
[2022-02-02 11:56:35] [64301.326871]  iwl_trans_pcie_gen2_start_fw+0x1af/0x360 [iwlwifi]
[2022-02-02 11:56:35] [64301.326887]  iwl_mvm_load_ucode_wait_alive+0x104/0x420 [iwlmvm]
[2022-02-02 11:56:35] [64301.326902]  ? 0xffffffffc0c4e000
[2022-02-02 11:56:35] [64301.326913]  iwl_run_unified_mvm_ucode+0xaf/0x230 [iwlmvm]
[2022-02-02 11:56:35] [64301.326927]  ? iwl_mvm_get_ppag_table+0x210/0x210 [iwlmvm]
[2022-02-02 11:56:35] [64301.326941]  iwl_run_init_mvm_ucode+0x1bb/0x380 [iwlmvm]
[2022-02-02 11:56:35] [64301.326955]  ? iwl_trans_pcie_set_bits_mask+0x48/0x60 [iwlwifi]
[2022-02-02 11:56:35] [64301.326972]  iwl_op_mode_mvm_start+0x689/0xa60 [iwlmvm]
[2022-02-02 11:56:35] [64301.326987]  _iwl_op_mode_start.isra.0+0x42/0x80 [iwlwifi]
[2022-02-02 11:56:35] [64301.327001]  iwl_req_fw_callback+0x609/0x6e0 [iwlwifi]
[2022-02-02 11:56:35] [64301.327015]  request_firmware_work_func+0x55/0xa0
[2022-02-02 11:56:35] [64301.327026]  process_one_work+0x1b6/0x350
[2022-02-02 11:56:35] [64301.327035]  worker_thread+0x4c/0x310
[2022-02-02 11:56:35] [64301.327044]  ? process_one_work+0x350/0x350
[2022-02-02 11:56:35] [64301.327053]  kthread+0x11b/0x140
[2022-02-02 11:56:35] [64301.327063]  ? __kthread_bind_mask+0x60/0x60

This is from an earlier resume:

(non-iwlwifi deleted)
[2022-01-31 12:21:46] [23591.845376] iwlwifi 0000:00:06.0: api flags index 2 larger than supported by driver
[2022-01-31 12:21:46] [23591.845405] iwlwifi 0000:00:06.0: TLV_FW_FSEQ_VERSION: FSEQ Version: 65.3.35.22
[2022-01-31 12:21:46] [23591.845671] iwlwifi 0000:00:06.0: loaded firmware version 59.601f3a66.0 QuZ-a0-hr-b0-59.ucode op_mode iwlmvm
[2022-01-31 12:21:46] [23591.845734] iwlwifi 0000:00:06.0: Detected Intel(R) Wi-Fi 6 AX201 160MHz, REV=0x354
[2022-01-31 12:21:46] [23592.040063] iwlwifi 0000:00:06.0: base HW address: f8:ac:65:d9:61:65
[2022-01-31 12:21:46] [23592.421080] iwlwifi 0000:00:06.0 wls6f0: renamed from wlan0

ā€¦ as if. Not sure what I spend my time on, but it does not seem to include much kernel debug these days.