Hi neowutran! I would love to offer my hardware virtually and be a guinea pig and help find a solution for myself and others, as it seems that the 4.2 has messed up a lot of HVM’s.
I will even pay $500 for a successful outcome, or $200 for trying and failing. To get things going, I can pay the $200 upfront and difference if successful.
This offer stands for any active members on qubes-os forum that have experience with GPU passthrough and max ram issues that came with the 4.2/xen updates.
A few conditions: We must get a Win10 or Win11 to work with minimum of 16GB RAM and be able to pass my Nvidia GPU successfully. While we attempt this, try and keep the qubes-os security and integrity as high as possible (avoiding downgrades and cleaning up as we attempt things that fail). I am also not the most adept unix user, so patience is key. I can follow most instructions if written in detail and provide feedback.
What I have tried so far and failed:
- excluding gpu and subaudio from grub using rd.qubes.hide_pci and regenerating the grub
- passing the gpu and gpu subaudio with permissive and no-strict-reset True
- attempting to install both video drivers provided by Dell and Nvidia on virgin clones of the fresh Win
- used this XML script in /etc/qubes/templates/libvirt/xen/by-name/ with both 2G and 3,5G and confirmed in virsch that the changes are reflected. In both cases, it did not work to boot with ram > 2GB
{% if vm.virt_mode == 'hvm' %}
<!-- server_ip is the address of stubdomain. It hosts it's own DNS server. -->
<emulator
{% if vm.features.check_with_template('linux-stubdom', True) %}
type="stubdom-linux"
{% else %}
type="stubdom"
{% endif %}
{% if vm.netvm %}
{% if vm.features.check_with_template('linux-stubdom', True) %}
{% if (vm.devices['pci'].persistent() | list) %}
cmdline="-qubes-net:client_ip={{ vm.ip -}}
,dns_0={{ vm.dns[0] -}}
,dns_1={{ vm.dns[1] -}}
,gw={{ vm.netvm.gateway -}}
,netmask={{ vm.netmask }} -machine xenfv,max-ram-below-4g=3.5G"
{% else %}
cmdline="-qubes-net:client_ip={{ vm.ip -}}
,dns_0={{ vm.dns[0] -}}
,dns_1={{ vm.dns[1] -}}
,gw={{ vm.netvm.gateway -}}
,netmask={{ vm.netmask }}"
{% endif %}
{% else %}
{% if (vm.devices['pci'].persistent() | list) %}
cmdline="-net lwip,client_ip={{ vm.ip -}}
,server_ip={{ vm.dns[1] -}}
,dns={{ vm.dns[0] -}}
,gw={{ vm.netvm.gateway -}}
,netmask={{ vm.netmask }} -machine xenfv,max-ram-below-4g=3.5G"
{% else %}
cmdline="-net lwip,client_ip={{ vm.ip -}}
,server_ip={{ vm.dns[1] -}}
,dns={{ vm.dns[0] -}}
,gw={{ vm.netvm.gateway -}}
,netmask={{ vm.netmask }}"
{% endif %}
{% endif %}
{% endif %}
{% if vm.stubdom_mem %}
memory="{{ vm.stubdom_mem * 1024 -}}"
{% endif %}
{% if vm.features.check_with_template('audio-model', False)
or vm.features.check_with_template('stubdom-qrexec', False) %}
kernel="/usr/libexec/xen/boot/qemu-stubdom-linux-full-kernel"
ramdisk="/usr/libexec/xen/boot/qemu-stubdom-linux-full-rootfs"
{% endif %}
{% if not vm.netvm %}
{% if (vm.devices['pci'].persistent() | list) %}
cmdline="-machine xenfv,max-ram-below-4g=3.5G"
{% endif %}
{% endif %}
/>
I had someone help me with these tips and tricks, but they ran out of ideas. I am willing to give it one more go and see if there is any way I can avoid using two laptops and can compress all my work in one.
I have tried attaching some logs you requested from another user but, as a new user, I am not allowed. So I will paste them here. These logs were copied when the windows qube was started with > 2GB of ram and it was stuck at the “could not read the boot disk” black screen. Not sure where I have to add the grub parameter ‘loglvl=all’ and ‘guest_loglvl=all’. This is without it.
dnflist
Errors during downloading metadata for repository 'qubes-dom0-cached':
- Curl error (37): Couldn't read a file:// file for file:///var/lib/qubes/updates/repodata/repomd.xml [Couldn't open file /var/lib/qubes/updates/repodata/repomd.xml]
Error: Failed to download metadata for repo 'qubes-dom0-cached': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
Ignoring repositories: qubes-dom0-cached
xen.x86_64 2001:4.17.2-8.fc37 @qubes-dom0-cached
xen-hvm-stubdom-linux.x86_64 4.2.8-1.fc37 @anaconda
xen-hvm-stubdom-linux-full.x86_64 4.2.8-1.fc37 @anaconda
xen-hypervisor.x86_64 2001:4.17.2-8.fc37 @qubes-dom0-cached
xen-libs.x86_64 2001:4.17.2-8.fc37 @qubes-dom0-cached
xen-licenses.x86_64 2001:4.17.2-8.fc37 @qubes-dom0-cached
xen-runtime.x86_64 2001:4.17.2-8.fc37 @qubes-dom0-cached
lspci
sudo lspci -vvv -s 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation GA104GLM [RTX A4500 Laptop GPU] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Dell Device 0b2b
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 16
Region 0: Memory at 93000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at 6000000000 (64-bit, prefetchable) [size=16G]
Region 3: Memory at 6400000000 (64-bit, prefetchable) [size=32M]
Region 5: I/O ports at 3000 [size=128]
Expansion ROM at 94080000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <1us, L1 <4us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 16GT/s, Width x8 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Via message, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [b4] Vendor Specific Information: Len=14 <?>
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [250 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [258 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=255us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=10us
Capabilities: [128 v1] Power Budgeting <?>
Capabilities: [420 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [bb0 v1] Physical Resizable BAR
BAR 0: current size: 16MB, supported: 16MB
BAR 1: current size: 16GB, supported: 64MB 128MB 256MB 512MB 1GB 2GB 4GB 8GB 16GB
BAR 3: current size: 32MB, supported: 32MB
Capabilities: [c1c v1] Physical Layer 16.0 GT/s <?>
Capabilities: [d00 v1] Lane Margining at the Receiver <?>
Capabilities: [e00 v1] Data Link Feature <?>
Kernel driver in use: pciback
Kernel modules: nouveau
sudo lspci -vvv -s 01:00.1
01:00.1 Audio device: NVIDIA Corporation GA104 High Definition Audio Controller (rev a1)
Subsystem: NVIDIA Corporation Device 0000
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 17
Region 0: Memory at 94000000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75W
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <1us, L1 <4us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 16GT/s, Width x8 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Via message, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [160 v1] Data Link Feature <?>
Kernel driver in use: pciback
Kernel modules: snd_hda_intel
Thank you and please contact me privately if you or anyone is open to dedicating some time and patience in getting to the bottom of this.