My 4.1-rc2 system sys-net paniced. It was after a wake-from-suspend, if that makes any difference.
sys-net.log (455.5 KB)
My 4.1-rc2 system sys-net paniced. It was after a wake-from-suspend, if that makes any difference.
sys-net.log (455.5 KB)
Wow.
Have you tried resetting the PCI wifi card?
# echo 1 > /sys/bus/pci/drivers/pciback/<PCI Device>/d3cold_allowed
# echo 1 > /sys/bus/pci/drivers/pciback/<PCI Device>/reset
# echo 1 > /sys/bus/pci/drivers/pciback/<PCI Device>/rescan
Just to be clear:
sys-net
was not runningsys-net
Right?
When I suspend my machine, I do not shut down sys-net. You cannot readily stop sys-net, as sys-firewall uses it. Similarly, you cannot readily stop sys-firewall as all sorts of AppVMs use it.
One can use the ākillā command via the Qube Manager app to force a paniced sys-net to go away, ignoring any dependencies.
Anyway, when a system panics, it is not usually possible to get a shell prompt in order to try resetting the WiFi. sys-net had paniced, per the log, but had not reset/reboot, and was still ārunningā, per Qubes.
Well, in that case, none of what I said earlier applies, because you suspended your machine while sys-net
was runningā¦
Has it done anything like this before?
As I am not using wifi, I removed the PCI device from the sys-net VM, so I have a work-around. I am just reporting the panic here as a āheads upā in case anyone else has their network go catatonic and sys-net be unresponsive, plus if any devs want to use the info from the sys-net.log console stack trackbacks of the issue (there were a lot of errors logged priot to the panic, just after a resume from suspend).
Iām also willing to add the wifi PCI-device back in order to either further debug this issue or to test/verify any fixes.
Iāve had quite a few issues with Broadcom hardware on quite a few machines, and some of them even ended up causing a dom0 kernel panic.
To tell you the truth, I havenāt had a single machine that runs Qubes OS that has been able to wake from suspend without some sort of issue (delay before becoming responsive, display/backlight not fully turning on, wifi card not waking up, keyboard not being recognised). I have been longing for the day when I can close the lid, open it up, and everything just worksā¦
Thankfully, boot and shutdown times arenāt as long as they used to be, so thatās what Iāve been doing as a workaroundā¦
Granted, I run Qubes on some pretty niche hardware, thoughā¦
Good on you for reporting it. Every bit helps.
Try to add iwlmvm
and iwlwifi
to /rw/config/suspend-module-blacklist
in sys-net
That is good to know. Its now ācat appreciation hourā, but after I perform the necessary adulations (IE: tomorrow) I might add back the PCI dev and try poking at it a bit to see if this would permit the use of the device both for networking and with suspend/resume.
This is an intel wifi. I attach the dom0 lspci.log (63.6 KB) which includes the wifi.
Suspend/resume usually works quite well for me on Intel Nuc 10 and Nuc 7 as well as an HP Z8 workstation. Mainly just the HiDPI issues. But I agree: I would like to identify something for which pretty much everything works out of the box. Iāve never had the faith/courage to try hibernate.
I know. My issues with Broadcom werenāt related to your Intel card. I was just saying
You have every right to flex, and apologies in advance if any of my drool got on the floor. DAMN that is a nice machine!
Intel wifi usually works flawlessly. Iāve got an AX200 in one of my machines, and the worst thing it did was refuse to be detected after suspend/resume, but it was fixed by rebooting sys-net
.
Your kernel panic is very curiousā¦
ā¦and isnāt a ThinkPad (yes, theyāre nice, but you want a bit of variety, you knowā¦)
This is the key part (near the beginning of the whole trace):
[2021-11-21 16:55:29] [ 8698.725354] kworker/1:1: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
āpage allocation failureā ā out of memory. Try increasing memory assigned to sys-net (maybe add 50MB or such). You can also observe top
there to see what uses most RAM.
I added the wifi device back to sys-net just to see if I could re-create the panic prior to trying the āadd more memoryā to see if it was reproducible.
My dmesg
is full of other iwlwifi
errors and stack traces, but so far, has not panic()
ed.
Iāll see if I can find the source for iwlwifi
matching what is on sys-net and look for any obvious place where it needs to be fixed so as not to panic after having a few page allocation failuresā¦
Happened again.
[2022-02-02 11:56:35] [64301.310535] iwlwifi 0000:00:06.0: api flags index 2 larger than supported by driver
[2022-02-02 11:56:35] [64301.310565] iwlwifi 0000:00:06.0: TLV_FW_FSEQ_VERSION: FSEQ Version: 65.3.35.22
[2022-02-02 11:56:35] [64301.310842] iwlwifi 0000:00:06.0: loaded firmware version 59.601f3a66.0 QuZ-a0-hr-b0-59.ucode op_mode iwlmvm
[2022-02-02 11:56:35] [64301.310914] iwlwifi 0000:00:06.0: Detected Intel(R) Wi-Fi 6 AX201 160MHz, REV=0x354
[2022-02-02 11:56:35] [64301.326599] kworker/1:1: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
[2022-02-02 11:56:35] [64301.326633] CPU: 1 PID: 8049 Comm: kworker/1:1 Not tainted 5.10.76-1.fc32.qubes.x86_64 #1
[2022-02-02 11:56:35] [64301.326649] Hardware name: Xen HVM domU, BIOS 4.14.3 11/11/2021
[2022-02-02 11:56:35] [64301.326667] Workqueue: events request_firmware_work_func
[2022-02-02 11:56:35] [64301.326679] Call Trace:
[2022-02-02 11:56:35] [64301.326693] dump_stack+0x6b/0x83
[2022-02-02 11:56:35] [64301.326704] warn_alloc.cold+0x7b/0xdf
[2022-02-02 11:56:35] [64301.326714] ? __alloc_pages_direct_compact+0x171/0x180
[2022-02-02 11:56:35] [64301.326726] __alloc_pages_slowpath.constprop.0+0xa13/0xa50
[2022-02-02 11:56:35] [64301.326737] ? get_page_from_freelist+0x1ac/0x370
[2022-02-02 11:56:35] [64301.326748] __alloc_pages_nodemask+0x34c/0x380
[2022-02-02 11:56:35] [64301.326760] kmalloc_order+0x28/0x100
[2022-02-02 11:56:35] [64301.326769] kmalloc_order_trace+0x19/0x80
[2022-02-02 11:56:35] [64301.326779] __kmalloc+0x227/0x260
[2022-02-02 11:56:35] [64301.326796] iwl_pcie_rx_alloc+0x6b/0x1e0 [iwlwifi]
[2022-02-02 11:56:35] [64301.326811] _iwl_pcie_rx_init+0x331/0x350 [iwlwifi]
[2022-02-02 11:56:35] [64301.326827] ? iwl_mvm_nic_config+0x124/0x1e0 [iwlmvm]
[2022-02-02 11:56:35] [64301.326856] iwl_pcie_gen2_nic_init+0x69/0xe0 [iwlwifi]
[2022-02-02 11:56:35] [64301.326871] iwl_trans_pcie_gen2_start_fw+0x1af/0x360 [iwlwifi]
[2022-02-02 11:56:35] [64301.326887] iwl_mvm_load_ucode_wait_alive+0x104/0x420 [iwlmvm]
[2022-02-02 11:56:35] [64301.326902] ? 0xffffffffc0c4e000
[2022-02-02 11:56:35] [64301.326913] iwl_run_unified_mvm_ucode+0xaf/0x230 [iwlmvm]
[2022-02-02 11:56:35] [64301.326927] ? iwl_mvm_get_ppag_table+0x210/0x210 [iwlmvm]
[2022-02-02 11:56:35] [64301.326941] iwl_run_init_mvm_ucode+0x1bb/0x380 [iwlmvm]
[2022-02-02 11:56:35] [64301.326955] ? iwl_trans_pcie_set_bits_mask+0x48/0x60 [iwlwifi]
[2022-02-02 11:56:35] [64301.326972] iwl_op_mode_mvm_start+0x689/0xa60 [iwlmvm]
[2022-02-02 11:56:35] [64301.326987] _iwl_op_mode_start.isra.0+0x42/0x80 [iwlwifi]
[2022-02-02 11:56:35] [64301.327001] iwl_req_fw_callback+0x609/0x6e0 [iwlwifi]
[2022-02-02 11:56:35] [64301.327015] request_firmware_work_func+0x55/0xa0
[2022-02-02 11:56:35] [64301.327026] process_one_work+0x1b6/0x350
[2022-02-02 11:56:35] [64301.327035] worker_thread+0x4c/0x310
[2022-02-02 11:56:35] [64301.327044] ? process_one_work+0x350/0x350
[2022-02-02 11:56:35] [64301.327053] kthread+0x11b/0x140
[2022-02-02 11:56:35] [64301.327063] ? __kthread_bind_mask+0x60/0x60
This is from an earlier resume:
(non-iwlwifi deleted)
[2022-01-31 12:21:46] [23591.845376] iwlwifi 0000:00:06.0: api flags index 2 larger than supported by driver
[2022-01-31 12:21:46] [23591.845405] iwlwifi 0000:00:06.0: TLV_FW_FSEQ_VERSION: FSEQ Version: 65.3.35.22
[2022-01-31 12:21:46] [23591.845671] iwlwifi 0000:00:06.0: loaded firmware version 59.601f3a66.0 QuZ-a0-hr-b0-59.ucode op_mode iwlmvm
[2022-01-31 12:21:46] [23591.845734] iwlwifi 0000:00:06.0: Detected Intel(R) Wi-Fi 6 AX201 160MHz, REV=0x354
[2022-01-31 12:21:46] [23592.040063] iwlwifi 0000:00:06.0: base HW address: f8:ac:65:d9:61:65
[2022-01-31 12:21:46] [23592.421080] iwlwifi 0000:00:06.0 wls6f0: renamed from wlan0
ā¦ as if. Not sure what I spend my time on, but it does not seem to include much kernel debug these days.