Ophrys
April 13, 2023, 8:49pm
#1
Remarks
Disable SMT (multithreading) manually in the BIOS to get things working.
Attachments
New user, so cannot attach file. Here it is:
layout:
‘hcl’
type:
‘desktop’
hvm:
‘yes’
iommu:
‘yes’
slat:
‘yes’
tpm:
‘unknown’
remap:
‘yes’
brand: |
Asus
model: |
TUF GAMING X570-PLUS
bios: |
4602
cpu: |
AMD Ryzen 7 3700X 8-Core Processor
cpu-short: |
Ryzen 7 3700X
chipset: |
Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex [1022:1480]
chipset-short: |
AMD X570
gpu: |
Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c1) (prog-if 00 [VGA controller])
gpu-short: |
Radeon RX 5700XT 8GB
network: |
Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 26)
memory: |
49062
scsi: |
Corsair MP600 PRO LPX 2TB
usb: |
3
versions:
works:
yes
qubes: |
R4.1
xen: |
4.14.5
kernel: |
5.15.94-1
remark: |
works good; UEFI ok; had to disable SMT (multithreading) manually in the BIOS to get things working
credit: |
Ophrys
link: |
Profile - Ophrys - Qubes OS Forum
1 Like
Ophrys
April 18, 2023, 9:20pm
#2
The issue about SMT has been reported here:
opened 02:33PM - 11 Apr 23 UTC
T: bug
C: doc
C: Xen
P: default
hardware support
### Qubes OS release
Qubes release 4.1.1 (R4.1)
### Brief summary
Due t… o an upstream Xen issue [1] - currently with no documentation or even a proper upstream bug report - on some AMD Ryzen CPUs / motherboards, IOMMU malfunctions on Xen. One symptom of a broken IOMMU is a system hang during boot at initramfs's splash screen, with "nvme0: I/O 0 QID 0 timeout, completion polled" messages. Other users have also reported boot hanging when using the Qubes installation disc.
One workaround is disabling SMT (hyperthreading) in BIOS. This is harmless in Qubes since Qubes does not use SMT, but without documentation, it's extremely difficult to find this workaround. I spent half an hour searching for this error message before finding a forum post mentioning SMT. This question is also raised at a Xen mailing list but without any response, indicating that the problem should be worked on the upstream first.
This is likely a duplicate of #7620, #7570 or other previously reported issues that I'm not familiar with. However, the disable-SMT workaround has not appeared in any of the existing bug report that I'm aware of. All the existing report was also hardware-specific, but now it's clear that it's a systematic issue. Thus, I propose that it should be treated as a separate lack-of-documentation bug report. Though, other workarounds like `dom0_max_vcpus=1 dom0_vcpus_pin` should also be documented.
### Affected Hardware
Some examples include:
1. Ryzen 9 6900HS mobile CPU (2022 G14 GA402RK laptop). [2]
2. AMD 5700X desktop CPU, multiple cases on multiple motherboards. [2]
3. Ryzen 7 6800U (GPD Win Max 2 laptop). [3]
4. Unspecified Zen 3 CPU with Asus Pro WS 565-ACE motherboard (X570 chipset), official Xen mailing list report. [1]
### Steps to reproduce
1. Install QubesOS onto a NVMe SSD on an Intel motherboard.
2. Move QubesOS to an AMD AM4 motherboard with X399 or X570 chipset, with an Ryzen 5000 series CPU (Zen 3) installed.
3. Boot to NVMe. To allow seeing the error messages, now disable plymouth splash screen using root via the commands:
echo 'omit_dracutmodules+=" plymouth "' > /etc/dracut.conf.d/disable-plymouth.conf
cd /boot
dracut --force
4. Enable IOMMU in BIOS.
5. Reboot to NVMe.
OR
1. Boot QubesOS installer on the same AMD hardware (I didn't test it, but it was mentioned in a forum post).
### Expected behavior
Boot should continue without hanging, the LUKS passphrase prompt should appear and one should be enter QubesOS after typing the passphrase.
### Actual behavior
initramfs hangs at splash screen. If plymouth is disabled, after waiting for 3 to 5 minutes, NVMe timeout messages will appear in dmesg and be printed on the screen, similar to:
nvme nvme0: I/O 0 QID 0 timeout, completion polled
nvme nvme1: I/O 8 QID 0 timeout, completion polled
### Workaround
Disable Simultaneous Multi-Threading (SMT) in firmware, via the UEFI BIOS setup screen (SMT is more commonly known by users as Intel's trademark "Hyperthreading", and it's worth mentioning it in the documentation).
Other workarounds include other `dom0_max_vcpus=1 dom0_vcpus_pin`, previously described in other bug reports.
# References
[1] Hang booting Dom0: nvme timeout, completion polled
https://lists.xenproject.org/archives/html/xen-users/2023-03/msg00001.html
[2] Installer does not boot - nvme timeout completion polled
https://forum.qubes-os.org/t/installer-does-not-boot-nvme-timeout-completion-polled/13639/2
[3] GPD Win Max 2 - Unable to boot installer
https://forum.qubes-os.org/t/gpd-win-max-2-unable-to-boot-installer/14466
Sven
April 22, 2023, 5:33pm
#3
Thank you @Ophrys for your HCL report, which is online now!