My original HCL for reference: [qubes-users] HCL - MSI Bravo 17
I can boot this laptop on with a Debian 11 live (xfce+nonfree) USB, and I can see that debian’s linux 5.10.46:
- properly supports VFIO
- properly suspends and resumes
… while Qubes 4.1 with 5.10.47 (and 5.12) does not.
Some collected information follows, I’ll dig from there, but if anyone can suggest experiments and/or tell that some of those differences are normal and harmless it would help me to focus on the right stuff.
At first sight I suspect a link between the memory stuff and suspend issue, with differences in MTRR and e820 handling being one suspect - I’m not too familiar with how the hypervisor plays with those.
On VFIO side I’m planning to activate/add some traces to understand why it does not see the IOMMU.
dmesg
The dmesg diff notably shows:
-
a memory region that Qubes shows as reserved but Debian shows as slightly different:
-[ 0.000000] BIOS-e820: [mem 0x00000000ab98e000-0x00000000ad579fff] reserved -[ 0.000000] BIOS-e820: [mem 0x00000000ad57a000-0x00000000ad5fefff] type 20 +[ 0.000000] Xen: [mem 0x00000000ab98e000-0x00000000ad5fefff] reserved
-
Debian shows more info from e820 about memory ranges, with an impact on hibernation:
-[ 0.000438] e820: update [mem 0xb0000000-0xffffffff] usable ==> reserved -[ 0.003572] esrt: Reserving ESRT space from 0x00000000a8c45c18 to 0x00000000a8c45c50. -[ 0.003580] e820: update [mem 0xa8c45000-0xa8c45fff] usable ==> reserved -[ 0.003592] e820: update [mem 0xa5bd0000-0xa5bd2fff] usable ==> reserved -[ 0.003624] Using GB pages for direct mapping -[ 0.010519] e820: update [mem 0xa6334000-0xa6427fff] usable ==> reserved -[ 0.010526] smpboot: Allowing 16 CPUs, 0 hotplug CPUs -[ 0.010543] PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff] -[ 0.010544] PM: hibernation: Registered nosave memory: [mem 0x000a0000-0x000fffff] -[ 0.010545] PM: hibernation: Registered nosave memory: [mem 0x09bff000-0x09ffffff] -[ 0.010546] PM: hibernation: Registered nosave memory: [mem 0x0a200000-0x0a20cfff] -[ 0.010548] PM: hibernation: Registered nosave memory: [mem 0xa5bd0000-0xa5bd2fff] -[ 0.010549] PM: hibernation: Registered nosave memory: [mem 0xa6334000-0xa6427fff] -[ 0.010550] PM: hibernation: Registered nosave memory: [mem 0xa8c45000-0xa8c45fff] -[ 0.010551] PM: hibernation: Registered nosave memory: [mem 0xaa26b000-0xab788fff] -[ 0.010551] PM: hibernation: Registered nosave memory: [mem 0xab789000-0xab7d9fff] -[ 0.010551] PM: hibernation: Registered nosave memory: [mem 0xab7da000-0xab98dfff] -[ 0.010552] PM: hibernation: Registered nosave memory: [mem 0xab98e000-0xad579fff] -[ 0.010552] PM: hibernation: Registered nosave memory: [mem 0xad57a000-0xad5fefff] -[ 0.010553] PM: hibernation: Registered nosave memory: [mem 0xae000000-0xafffffff] -[ 0.010554] PM: hibernation: Registered nosave memory: [mem 0xb0000000-0xefffffff] -[ 0.010554] PM: hibernation: Registered nosave memory: [mem 0xf0000000-0xf7ffffff] -[ 0.010554] PM: hibernation: Registered nosave memory: [mem 0xf8000000-0xfcffffff] -[ 0.010555] PM: hibernation: Registered nosave memory: [mem 0xfd000000-0xffffffff]
-
some different values in an EFI report:
-[ 0.000000] efi: ACPI=0xab977000 ACPI 2.0=0xab977014 TPMFinalLog=0xab946000 SMBIOS=0xad429000 SMBIOS 3.0=0xad428000 MEMATTR=0xa67cc118 ESRT=0xa8c45c18 MOKvar=0xa5bd0000 +[ 0.000000] efi: ACPI=0xab977000 ACPI 2.0=0xab977014 TPMFinalLog=0xab946000 SMBIOS=0xad429000 SMBIOS 3.0=0xad428000 MEMATTR=0xa6429698 ESRT=0xa8c2d018
-
secure boot enabled on Debian and not on Qubes:
-[ 0.000000] secureboot: Secure boot could not be determined (mode 0) +[ 0.251353] Secure boot disabled -[ 1.098378] Loaded X.509 cert 'Debian Secure Boot CA: 6ccece7e4c6c0d1f6149f3dd27dfcc5cbb419ea1' -[ 1.098397] Loaded X.509 cert 'Debian Secure Boot Signer 2021 - linux: 4b6ef5abca669825178e052c84667ccbc0531f8c'
-
Qubes has MTRR disabled, impacting PAT configuration:
-[ 0.000136] MTRR default type: uncachable -[ 0.000136] MTRR fixed ranges enabled: -[ 0.000137] 00000-9FFFF write-back -[ 0.000138] A0000-DFFFF uncachable -[ 0.000138] E0000-FFFFF write-protect -[ 0.000139] MTRR variable ranges enabled: -[ 0.000140] 0 base 000000000000 mask FFFF80000000 write-back -[ 0.000140] 1 base 000080000000 mask FFFFE0000000 write-back -[ 0.000141] 2 base 0000A0000000 mask FFFFF0000000 write-back -[ 0.000141] 3 disabled -[ 0.000142] 4 disabled -[ 0.000142] 5 disabled -[ 0.000142] 6 disabled -[ 0.000143] 7 disabled -[ 0.000143] TOM2: 0000000450000000 aka 17664M -[ 0.000337] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT +[ 0.025910] x86/PAT: MTRRs disabled, skipping PAT initialization too. +[ 0.025913] x86/PAT: Configuration [0-7]: WB WT UC- UC WC WP UC UC
-
PSP show different initialization issues on both platforms, but still appear to be used for forware loading in both cases:
-[ 1.423015] [drm] add ip block number 3 <psp> -[ 1.440204] [drm] PSP loading VCN firmware -[ 2.372998] [drm] Loading DMUB firmware via PSP: version=0x00000000 -[ 2.373090] [drm] PSP loading VCN firmware +[ 3.342535] ccp 0000:07:00.2: tee: ring init command failed (0x00000005) +[ 3.343355] ccp 0000:07:00.2: tee: failed to init ring buffer +[ 3.344155] ccp 0000:07:00.2: tee initialization failed +[ 3.345388] ccp 0000:07:00.2: psp initialization failed +[ 3.464296] [drm] add ip block number 3 <psp> +[ 3.500352] [drm] Loading DMUB firmware via PSP: version=0x00000000 +[ 3.500456] [drm] PSP loading VCN firmware -[ 6.399534] ccp 0000:07:00.2: enabling device (0000 -> 0002) -[ 6.399659] ccp 0000:07:00.2: ccp: unable to access the device: you might be running a broken BIOS. -[ 6.409802] ccp 0000:07:00.2: tee enabled -[ 6.409805] ccp 0000:07:00.2: psp enabled
-
Debian shows direct-loading of many firmware blobs, while Qubes shows virtually none:
-[ 1.437206] amdgpu 0000:03:00.0: firmware: direct-loading firmware amdgpu/navi14_sos.bin -[ 1.437279] amdgpu 0000:03:00.0: firmware: direct-loading firmware amdgpu/navi14_asd.bin -[ 1.437305] amdgpu 0000:03:00.0: firmware: direct-loading firmware amdgpu/navi14_ta.bin -[ 1.437395] amdgpu 0000:03:00.0: firmware: direct-loading firmware amdgpu/navi14_smc.bin -[ 1.437623] amdgpu 0000:03:00.0: firmware: direct-loading firmware amdgpu/navi14_pfp.bin -[ 1.437742] amdgpu 0000:03:00.0: firmware: direct-loading firmware amdgpu/navi14_me.bin -[ 1.437839] amdgpu 0000:03:00.0: firmware: direct-loading firmware amdgpu/navi14_ce.bin -[ 1.437869] amdgpu 0000:03:00.0: firmware: direct-loading firmware amdgpu/navi14_rlc.bin -[ 1.437963] amdgpu 0000:03:00.0: firmware: direct-loading firmware amdgpu/navi14_mec.bin -[ 1.438065] amdgpu 0000:03:00.0: firmware: direct-loading firmware amdgpu/navi14_mec2.bin -[ 1.439979] amdgpu 0000:03:00.0: firmware: direct-loading firmware amdgpu/navi14_sdma.bin -[ 1.440007] amdgpu 0000:03:00.0: firmware: direct-loading firmware amdgpu/navi14_sdma1.bin -[ 1.440198] amdgpu 0000:03:00.0: firmware: direct-loading firmware amdgpu/navi14_vcn.bin ... -[ 2.370891] amdgpu 0000:07:00.0: firmware: direct-loading firmware amdgpu/renoir_sdma.bin ... -[ 2.371209] amdgpu 0000:07:00.0: firmware: direct-loading firmware amdgpu/renoir_asd.bin -[ 2.371223] amdgpu 0000:07:00.0: firmware: direct-loading firmware amdgpu/renoir_ta.bin -[ 2.371239] amdgpu 0000:07:00.0: firmware: direct-loading firmware amdgpu/renoir_pfp.bin -[ 2.371247] amdgpu 0000:07:00.0: firmware: direct-loading firmware amdgpu/renoir_me.bin -[ 2.371255] amdgpu 0000:07:00.0: firmware: direct-loading firmware amdgpu/renoir_ce.bin -[ 2.371267] amdgpu 0000:07:00.0: firmware: direct-loading firmware amdgpu/renoir_rlc.bin -[ 2.371320] amdgpu 0000:07:00.0: firmware: direct-loading firmware amdgpu/renoir_mec.bin -[ 2.371370] amdgpu 0000:07:00.0: firmware: direct-loading firmware amdgpu/renoir_mec2.bin -[ 2.372995] amdgpu 0000:07:00.0: firmware: direct-loading firmware amdgpu/renoir_dmcub.bin -[ 2.372998] [drm] Loading DMUB firmware via PSP: version=0x00000000 -[ 2.373086] amdgpu 0000:07:00.0: firmware: direct-loading firmware amdgpu/renoir_vcn.bin ... -[ 6.538034] platform regulatory.0: firmware: direct-loading firmware regulatory.db -[ 6.565324] platform regulatory.0: firmware: direct-loading firmware regulatory.db.p7s +[ 16.699469] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
-
IOMMU: the kernel simply believes there is such available feature
-[ 1.046733] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported -[ 1.046883] pci 0000:00:00.2: can't derive routing for PCI INT A -[ 1.046884] pci 0000:00:00.2: PCI INT A: not connected -[ 1.046919] pci 0000:00:01.0: Adding to iommu group 0 ... -[ 1.047326] pci 0000:08:00.1: Adding to iommu group 6 -[ 1.048767] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40 -[ 1.048769] pci 0000:00:00.2: AMD-Vi: Extended features (0x206d73ef22254ade): -[ 1.048770] PPR X2APIC NX GT IA GA PC GA_vAPIC -[ 1.048772] AMD-Vi: Interrupt remapping enabled -[ 1.048772] AMD-Vi: Virtual APIC enabled -[ 1.048772] AMD-Vi: X2APIC enabled -[ 1.049006] AMD-Vi: Lazy IO/TLB flushing enabled ... -[ 1.052192] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank). ... -[ 1.064867] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de> +[ 3.347333] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de> +[ 3.348102] AMD-Vi: AMD IOMMUv2 functionality not available on this system
There are many more diffs, but those probably give quite some food for thought already.
cpuinfo
- the “power management” field is empty on Qubes, and on Debian has
ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
- in cpu flags, Qubes gets
hypervisor tsc_known_freq
, likely from Xen, but losesvme pse sep mtrr pge pse36 pdpe1gb aperfmperf monitor svm extapic cr8_legacy osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme mba sev ibrs stibp sev_es smep cqm rdt_a smap xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local irperf rdpru wbnoinvd npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip overflow_recov succor smca
There is probably a link between some behaviours observed in the log and some missing flags. Eg. is it normal that we don’t see mtrr
flag here ?