Wifi errors on resume from suspend

Ruckus8997 · July 5, 2022, 12:46am

I know there is a variety of issues with sys-net on resuming from suspend. I’m in that club. The error message from dmesg is something I haven’t seen referenced elsewhere. I get hundreds of the “timed out to flush queue” and sometimes its 1 and sometimes 2, and then the firmware failed message.

[ 3304.748896] rtw_8822be 0000:00:07.0: timed out to flush queue 2
[ 3310.980344] rtw_8822be 0000:00:07.0: firmware failed to restore hardware setting

Any help appreciated.

Ruckus8997 · July 5, 2022, 3:51pm

I’ve been doing some more testing on this, and found that the behavior is not consistent.

sys-net always fails to restart on a resume from suspend
When I manually start it, the wifi consistently starts and connects. (I’m pretty sure wifi failed to connect once, but can’t repeat). Sys-net connects to the internet.
After that, sometimes qubes based on fedora templates reconnect to the network and work as normal. But sometimes they don’t.
My standalone Windows qube never reconnects to the network
If I try to restart any of qubes (windows or fedora) to see if they reconnect, I get a libxl error. (Thought I had the text of the error massage and logs, but now I can’t find it. I’ll try to reproduce and post here when I get a chance.)

So it seems like the core issue is sys-net failing to start on resume. I need to do a little more testing to confirm if those dmesg errors are being written on resume. That will be my next test.

I’ll keep working on it, but if anybody has any insight, or guidance on what I should be looking for, much appreciated.

Ruckus8997 · July 5, 2022, 4:14pm

Obvious now in hindsight, but I didn’t think of it before. dmesg only tells me what has happened since the kernel started. So all of the messages in sys-net are only since I manually started the kernel.

Ruckus8997 · July 5, 2022, 5:03pm

Sorry - been away from Qubes OS for quite a while and having to relearn how to do some of these things. Hopefully my stumbling is useful to somebody. So I got the sys-net console output when attempting to resume from suspend. I used this command:
tail -n 200 /var/log/xen/console/guest-sys-net.log

And here’s the relevant output, ending in a kernel panic. I’ll keep working on this, but as always, any insight appreciated.

[2022-07-05 09:48:11] [ 87.632859] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 0000718ac78ee317 [2022-07-05 09:48:11] [ 87.632880] RDX: 000000000000000d RSI: 00005fbba41bcd70 RDI: 0000000000000001 [2022-07-05 09:48:11] [ 87.632901] RBP: 00005fbba41bcd70 R08: 000000000000000a R09: 0000718ac79844e0 [2022-07-05 09:48:11] [ 87.632922] R10: 0000718ac79843e0 R11: 0000000000000246 R12: 000000000000000d [2022-07-05 09:48:11] [ 87.632944] R13: 0000718ac79c1520 R14: 000000000000000d R15: 0000718ac79c1700 [2022-07-05 09:48:11] [ 87.632967] Modules linked in: ehci_pci ehci_hcd xt_nat ccm nf_conntrack_netlink nft_reject_ipv4 nft_reject nft_ct nf_tables nfnetlink ip6table_filter ip6table_mangle ip6table_raw ip6_tables ipt_REJECT nf_reject_ipv4 xt_state xt_conntrack iptable_filter joydev iptable_mangle iptable_raw xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel rtw88_8822be rtw88_8822b rtw88_pci rtw88_core mac80211 serio_raw pcspkr cfg80211 rfkill libarc4 e1000e floppy drm_vram_helper drm_ttm_helper ttm drm_kms_helper cec i2c_piix4 ata_generic pata_acpi xen_scsiback target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn ipmi_devintf ipmi_msghandler drm fuse bpf_preload ip_tables overlay xen_blkfront [last unloaded: ehci_hcd] [2022-07-05 09:48:11] [ 87.633198] ---[ end trace ecdd91f7c2a063b1 ]--- [2022-07-05 09:48:11] [ 87.633213] RIP: 0010:__free_pages+0x83/0x90 [2022-07-05 09:48:11] [ 87.633228] Code: 53 cf ff ff eb d1 85 f6 75 09 5b 5d 41 5c e9 84 fe ff ff 5b 31 d2 5d 41 5c e9 39 cf ff ff 48 c7 c6 98 0f 5a ad e8 6d 30 fd ff <0f> 0b 5b 5d 41 5c c3 66 0f 1f 44 00 00 0f 1f 44 00 00 48 85 ff 75 [2022-07-05 09:48:11] [ 87.645787] e1000e 0000:00:06.0 ens6: renamed from eth0 [2022-07-05 09:48:11] [ 87.766981] RSP: 0018:ffff98ee403e3b88 EFLAGS: 00010296 [2022-07-05 09:48:11] [ 87.767006] RAX: 0000000000000000 RBX: ffff8c87cd1e6e78 RCX: ffff8c87d7120a88 [2022-07-05 09:48:11] [ 87.767024] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8c87d7120a80 [2022-07-05 09:48:11] [ 87.767042] RBP: fffff00380038100 R08: 0000000000000000 R09: ffff98ee403e3890 [2022-07-05 09:48:11] [ 87.767061] R10: ffff98ee403e3888 R11: ffffffffad944548 R12: ffff8c87cedb2000 [2022-07-05 09:48:11] [ 87.767078] R13: ffff8c87c1a3b0b8 R14: 00000000fffffff4 R15: ffff8c87c0e04040 [2022-07-05 09:48:11] [ 87.767096] FS: 0000718ac77fa740(0000) GS:ffff8c87d7100000(0000) knlGS:0000000000000000 [2022-07-05 09:48:11] [ 87.767115] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [2022-07-05 09:48:11] [ 87.767130] CR2: 000077f41d974f08 CR3: 000000000ecf0003 CR4: 00000000003706e0 [2022-07-05 09:48:11] [ 87.767156] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [2022-07-05 09:48:11] [ 87.767178] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [2022-07-05 09:48:11] [ 87.767196] Kernel panic - not syncing: Fatal exception [2022-07-05 09:48:11] [ 87.767847] Kernel Offset: 0x2b000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Ruckus8997 · July 6, 2022, 2:36pm

sys-net.log (9.0 KB)

Beginning at line 29:
[2022-07-05 09:48:11] [ 87.632127] kernel BUG at include/linux/mm.h:707!
[2022-07-05 09:48:11] [ 87.632146] invalid opcode: 0000 [#1] SMP PTI

Still working on it.

Ruckus8997 · July 8, 2022, 12:44am

OK, anticlimactically, I resolved this by increasing the amount of memory in sys-net from 400MB to 1000MB. I picked the amount arbitrarily - don’t know what the magic number is.

astoliar · February 10, 2023, 4:48am

I can confirm that setting 600 MB also resolves the issue.