I have the same setup, and experienced the same issue. After updating to the 20221109 firmware, the module loads, but fails soon after. I suspect it’s a Xen issue since it works fine on vanilla fedora, and I sometimes get this dmesg output relating to page allocation failures:
I managed to work around the issue by swapping the QCNFA765 card out for an Intel AX200 which are cheap to pickup. Worked out-of-the-box after popping it in.
I’m running into the same problem, but on a laptop, so I can’t swap out the card. But this WiFi card worked in other Linux distros, so I know it’s not a Linux thing. Other Debian-based Linux distros, and my sys-net is Debian. Got 6.6.31-1.qubes kernel, got the lspci entry in dom0, showing it passed through to sys-net in Qubes Manager, sys-net has the module loaded. But iw list shows nothing!
That won’t work for me. It’s a work laptop, the hardware doesn’t belong to me. Just trying to give Qubes a shot rather than some other distro plus VirtualBox.
Then I guess using USB WiFi adapter is the only way for now.
It’s either an issue with this device passthrough to a VM in general which could be checked by trying to passthrough it in KVM or a Xen-specific issue if it’ll work in KVM.
What do you mean by passthrough in KVM? qvm-pci lists it as passed through to sys-net, and according to sys-net dmesg the kernel sees the device and tries to load ath11k_pci, it’s just ath11k_pci gets that error -110 on probe. I thought PCI passthrough would be one of those things where either it’s either passing through or it’s not. I tried switching sys-net to Fedora 39, with an atheros-firmware package version 20240610. I got this in dmesg:
I’ve tried installing the Debian Sid firmware-atheros package version 20230623-2 on my Debian template and switching sys-net back to Debian. I don’t notice a difference in the output.
I’ve noticed dom0 also loads ath11k_pci. I’m not sure why or whether that’d interfere with the passthrough, I thought Xen wants PCI devices only being used by one kernel. But I tried unloading the module and restarting sys-net and didn’t notice a difference.
I mean testing the PCI passthrough of this WiFi controller to a VM in general Linux (e.g. in Fedora) using KVM instead of Qubes OS to check whether it’s a problem with PCI passthrough of this device in general or just an issue with Xen/Qubes OS.
If the WiFi controller won’t work when you passthrough it to a VM in KVM then it should be a problem with this WiFi controller passthrough in general and not Xen/Qubes OS specific, then it could be reported to linux or kvm-devel mailing list and hope that the issue with this specific device will be addressed and fixed.
It should be hidden from dom0 and the kernel driver in use for this device should be pciback instead of ath11k_pci. You can check it in the lspci -k output in dom0.
Wouldn’t using the WiFi controller in dom0 directly, while compromising the security/purpose of Qubes, tell us whether it’s a passthrough problem or a Xen problem?
It does successfully connect in dom0:
[ 1003.862159] ath11k_pci 0000:02:00.0: BAR 0: assigned [mem 0x78600000-0x787fffff 64bit]
[ 1003.862522] ath11k_pci 0000:02:00.0: MSI vectors: 1
[ 1003.862531] ath11k_pci 0000:02:00.0: wcn6855 hw2.1
[ 1004.752447] ath11k_pci 0000:02:00.0: chip_id 0x12 chip_family 0xb board_id 0xff soc_id 0x400c1211
[ 1004.752457] ath11k_pci 0000:02:00.0: fw_version 0x110b196e fw_build_timestamp 2022-12-22 12:54 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23
[ 1004.845358] ath11k_pci 0000:02:00.0: leaving PCI ASPM disabled to avoid MHI M2 problems
[ 1005.103890] ath11k_pci 0000:02:00.0: Failed to set the requested Country regulatory setting
[ 1005.104020] ath11k_pci 0000:02:00.0: Failed to set the requested Country regulatory setting
[ 1005.119018] ath11k_pci 0000:02:00.0 wlp2s0: renamed from wlan0
[root@dom0]# nmcli
lo: unmanaged
"lo"
loopback (unknown), 00:00:00:00:00:00, sw, mtu 65536
wlp2s0: unmanaged
"Qualcomm QCNFA765"
wifi (ath11k_pci), XX:XX:XX:XX:XX:XX, plugin missing, hw, mtu 1500
Since it works when it’s not passed through, that means it’s not a Xen problem, right? It’s a passthrough problem. And my dom0 is the Qubes stable kernel and firmware, 6.6.31-1.qubes.fc37.x86_64, atheros-firmware version 20231111.
I can confirm almost exact replication of the symptoms. I have a Lenovo P14s, and have been unable to get the WiFi card loaded within the sys-net vm. Here’s what I’ve seen for log entries:
[Jul10 23:59] ath11k_pci 0000:00:07.0: BAR 0 [mem 0xf2000000-0xf21fffff 64bit]: assigned
[ +0.001965] ath11k_pci 0000:00:07.0: MSI vectors: 1
[ +0.000288] ath11k_pci 0000:00:07.0: wcn6855 hw2.1
[ +0.160318] mhi mhi0: Requested to power ON
[ +0.000055] mhi mhi0: Power on setup success
[ +20.121676] mhi mhi0: Transfer start failed
[ +0.000058] mhi mhi0: Bad MHI PM state: 4096 (Linkdown or Error Fatal Detect)
[ +0.000061] mhi mhi0: EE State: DISABLE
[ +0.000095] mhi mhi0: MHI did not load image over BHI, ret: -5
[ +0.001348] ath11k_pci 0000:00:07.0: failed to power up mhi: -110
[ +0.000026] ath11k_pci 0000:00:07.0: failed to start mhi: -110
[ +0.000038] ath11k_pci 0000:00:07.0: failed to power up :-110
[ +0.012170] ath11k_pci 0000:00:07.0: failed to create soc core: -110
[ +0.000021] ath11k_pci 0000:00:07.0: failed to init core: -110
[ +0.081680] ath11k_pci: probe of 0000:00:07.0 failed with error -110
I added the three lines above the one reporting MHI did not load image by modifying the mhi kernel module. They seem to indicate that the mhi (Modem Host Interface) driver is not properly loading firmware on to the card due to some kind of disconnection between the guest and the wifi card. I’m on kernel 6.8.8 in this case.
One other thing of note- the xen kernel log (accessed via xl dmesg from dom0) shows something like the following line once whenever I try to load the ath11k_pci driver at the moment:
(XEN) d[IDLE]v4: Unsupported MSI delivery mode 7 for Dom1
Anyone else seeing that in their logs? This may be related to this github issue with the amdgpu driver…
At this point, here’s a few things I could see causing the problem:
A mismatch in power management is leaving the card in the wrong state when the guest is starting up, or the card is not being properly reset, making the firmware load fail
Incorrect interrupt allocation/routing is causing the guest to miss notifications that the card is ready to receive a firmware download
The guest doesn’t have proper permissions (maybe permissive mode pci passthrough would help?) to access an odd register that the ath11k_pci driver needs to properly initialize.
Unfortunately I don’t have much evidence to point to one of those in particular, but I figured I’d put my data here in case it was helpful…
Almost forgot- I was able to resolve issues with memory allocation failures on driver load by increasing the memory allocation to sys-net to 1GB. Hopefully someone finds that helpful
I’m curious if anyone got this to work or not. I just purchased a p14s AMD gen 5. The last post on it had this and docking to be the issue. Docking is not a concern for me.