Ryzen 7000 serie

neowutran · December 18, 2022, 7:03pm

@balko this thread is about what need to be done to be able to use qubes os with a ryzen 7000 series.
I do not known the potentials issues of previous generation. However since at the moment, the xen hypervisor version used in stable release does not support cpu family 25, ryzen 7***, 6**** and 5**** should not work.

For the GPU passthrough:
On my old computer I have a RX580 that I can passthrough to a linux HVM for gaming.
I noticed that it seems there is a bug in the linux kernel for pci handling: The passthough work with lts kernel 5.4, but fail if I upgrade the kernel to 5.6.?+ (I can start the HVM but when I try to activate the GPU it fail with unhelpful error message) .

On my new computer, I restored the linux HVM. However, if I start it, it crash with kernel related error / memory violation

[2022-12-18 19:34:38] [    0.841975] general protection fault: 0000 [#1] SMP NOPTI
[2022-12-18 19:34:38] [    0.842001] CPU: 3 PID: 105 Comm: xenwatch Not tainted 5.4.215-1-lts54 #1
[2022-12-18 19:34:38] [    0.842016] Hardware name: Xen HVM domU, BIOS 4.18-unstable 12/11/2022
[2022-12-18 19:34:38] [    0.842033] RIP: 0010:kmem_cache_alloc_trace+0x84/0x200
[2022-12-18 19:34:38] [    0.842046] Code: 3b 76 49 83 78 10 00 4d 8b 38 0f 84 61 01 00 00 4d 85 ff 0f 84 58 01 00 00 41 8b 46 20 49 8b 9e 70 01 00 00 49 8b 3e 4c 01 f8 <48> 33 18 48 89 c1 4c 89 f8 48 0f c9 48 31 cb 48 8d 4a 01 65 48 0f
[2022-12-18 19:34:38] [    0.842084] RSP: 0018:ffffaef7001f3d88 EFLAGS: 00010202
[2022-12-18 19:34:38] [    0.842096] RAX: 1b6dd99358346dae RBX: e46fe56f475c6dae RCX: ffff97e0043bea10
[2022-12-18 19:34:38] [    0.842111] RDX: 0000000000000c14 RSI: 0000000000000d00 RDI: 0000000000034080
[2022-12-18 19:34:38] [    0.842128] RBP: 0000000000000d00 R08: ffff97e00adb4080 R09: 0000000000000000
[2022-12-18 19:34:38] [    0.842144] R10: 0000000000000001 R11: ffff97e00adb5170 R12: 0000000000000020
[2022-12-18 19:34:38] [    0.842161] R13: ffff97e009c03880 R14: ffff97e009c03880 R15: 1b6dd99358346dae
[2022-12-18 19:34:38] [    0.842178] FS:  0000000000000000(0000) GS:ffff97e00ad80000(0000) knlGS:0000000000000000
[2022-12-18 19:34:38] [    0.842195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:34:38] [    0.842208] CR2: 00007f2a87e69010 CR3: 0000000205eaa000 CR4: 0000000000740ee0
[2022-12-18 19:34:38] [    0.842236] PKRU: 55555554
[2022-12-18 19:34:38] [    0.842242] Call Trace:
[2022-12-18 19:34:38] [    0.842252]  ? blkfront_setup_indirect+0x138/0xdc0 [xen_blkfront]
[2022-12-18 19:34:38] [    0.842267]  blkfront_setup_indirect+0x138/0xdc0 [xen_blkfront]
[2022-12-18 19:34:38] [    0.842282]  ? count_strings+0x40/0x40
[2022-12-18 19:34:38] [    0.842291]  blkback_changed+0x302/0xe00 [xen_blkfront]
[2022-12-18 19:34:38] [    0.842302]  ? count_strings+0x40/0x40
[2022-12-18 19:34:38] [    0.842311]  xenwatch_thread+0x9a/0x160
[2022-12-18 19:34:38] [    0.842321]  ? wait_woken+0x80/0x80
[2022-12-18 19:34:38] [    0.842332]  kthread+0x10c/0x130
[2022-12-18 19:34:38] [    0.842340]  ? kthread_associate_blkcg+0x90/0x90
[2022-12-18 19:34:38] [    0.842352]  ret_from_fork+0x35/0x40
[2022-12-18 19:34:38] [    0.842361] Modules linked in: libps2 xen_blkfront(+) crc32c_intel ata_piix libata ehci_pci ehci_hcd scsi_mod i8042 floppy serio
[2022-12-18 19:34:38] [    0.842393] fbcon: Taking over console
[2022-12-18 19:34:38] [    0.842402] ---[ end trace 7d80e06b7a440a2c ]---
[2022-12-18 19:34:38] [    0.842412] RIP: 0010:kmem_cache_alloc_trace+0x84/0x200
[2022-12-18 19:34:38] [    0.842424] Code: 3b 76 49 83 78 10 00 4d 8b 38 0f 84 61 01 00 00 4d 85 ff 0f 84 58 01 00 00 41 8b 46 20 49 8b 9e 70 01 00 00 49 8b 3e 4c 01 f8 <48> 33 18 48 89 c1 4c 89 f8 48 0f c9 48 31 cb 48 8d 4a 01 65 48 0f
[2022-12-18 19:34:38] [    0.842463] RSP: 0018:ffffaef7001f3d88 EFLAGS: 00010202
[2022-12-18 19:34:38] [    0.842475] RAX: 1b6dd99358346dae RBX: e46fe56f475c6dae RCX: ffff97e0043bea10
[2022-12-18 19:34:38] [    0.842491] RDX: 0000000000000c14 RSI: 0000000000000d00 RDI: 0000000000034080
[2022-12-18 19:34:38] [    0.842507] RBP: 0000000000000d00 R08: ffff97e00adb4080 R09: 0000000000000000
[2022-12-18 19:34:38] [    0.842523] R10: 0000000000000001 R11: ffff97e00adb5170 R12: 0000000000000020
[2022-12-18 19:34:38] [    0.842538] R13: ffff97e009c03880 R14: ffff97e009c03880 R15: 1b6dd99358346dae
[2022-12-18 19:34:38] [    0.842555] FS:  0000000000000000(0000) GS:ffff97e00ad80000(0000) knlGS:0000000000000000
[2022-12-18 19:34:38] [    0.842572] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:34:38] [    0.842586] CR2: 00007f2a87e69010 CR3: 0000000205eaa000 CR4: 0000000000740ee0
[2022-12-18 19:34:38] [    0.842602] PKRU: 55555554
[2022-12-18 19:34:38] [    0.842665] Console: switching to colour frame buffer device 100x37
[2022-12-18 19:34:38] [    0.886137] Module has invalid ELF structures
[2022-12-18 19:34:38] [    0.888804] Module has invalid ELF structures
[2022-12-18 19:34:39] [    0.892788] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input2
[2022-12-18 19:34:39] [    0.893321] general protection fault: 0000 [#2] SMP NOPTI
[2022-12-18 19:34:39] [    0.893421] CPU: 3 PID: 2 Comm: kthreadd Tainted: G      D           5.4.215-1-lts54 #1
[2022-12-18 19:34:39] [    0.893554] Hardware name: Xen HVM domU, BIOS 4.18-unstable 12/11/2022
[2022-12-18 19:34:39] [    0.893658] RIP: 0010:__kmalloc_node+0x185/0x2d0
[2022-12-18 19:34:39] [    0.893837] Code: e8 4c 8b 44 24 08 4c 89 e1 4c 89 f2 4c 89 fe e8 a1 e1 99 00 48 83 3b 00 58 75 d5 e9 6a ff ff ff 41 8b 41 20 49 8b 39 4c 01 f0 <48> 8b 18 48 89 c1 49 33 99 70 01 00 00 4c 89 f0 48 0f c9 48 31 cb
[2022-12-18 19:34:39] [    0.894315] RSP: 0018:ffffaef700027d18 EFLAGS: 00010202
[2022-12-18 19:34:39] [    0.894484] RAX: 1b6dd99358346dae RBX: 0000000000000dc0 RCX: ffff97e009fb3810
[2022-12-18 19:34:39] [    0.895669] RDX: 0000000000000c14 RSI: 0000000000000dc0 RDI: 0000000000034080
[2022-12-18 19:34:39] [    0.895917] RBP: 0000000000000dc0 R08: ffff97e00adb4080 R09: ffff97e009c03880
[2022-12-18 19:34:39] [    0.896161] R10: ffffaef700355000 R11: ffffaef700350000 R12: 0000000000000020
[2022-12-18 19:34:39] [    0.896410] R13: 0000000000000000 R14: 1b6dd99358346dae R15: ffff97e009c03880
[2022-12-18 19:34:39] [    0.896662] FS:  0000000000000000(0000) GS:ffff97e00ad80000(0000) knlGS:0000000000000000
[2022-12-18 19:34:39] [    0.896916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:34:39] [    0.897159] CR2: 00007f2a87e69010 CR3: 0000000205d16000 CR4: 0000000000740ee0
[2022-12-18 19:34:39] [    0.897420] PKRU: 55555554
[2022-12-18 19:34:39] [    0.897653] Call Trace:
[2022-12-18 19:34:39] [    0.897884]  ? __vmalloc_node_range+0xd9/0x2d0
[2022-12-18 19:34:39] [    0.898120]  __vmalloc_node_range+0xd9/0x2d0
[2022-12-18 19:34:39] [    0.898353]  copy_process+0x923/0x1a60
[2022-12-18 19:34:39] [    0.898590]  ? _do_fork+0x74/0x3a0
[2022-12-18 19:34:39] [    0.898812]  ? __switch_to_asm+0x40/0x70
[2022-12-18 19:34:39] [    0.899030]  ? __switch_to_asm+0x34/0x70
[2022-12-18 19:34:39] [    0.899245]  ? __switch_to_asm+0x34/0x70
[2022-12-18 19:34:39] [    0.899455]  ? __switch_to_asm+0x40/0x70
[2022-12-18 19:34:39] [    0.899667]  _do_fork+0x74/0x3a0
[2022-12-18 19:34:39] [    0.899874]  ? finish_task_switch+0x72/0x240
[2022-12-18 19:34:39] [    0.900084]  kernel_thread+0x55/0x70
[2022-12-18 19:34:39] [    0.900284]  ? kthread_associate_blkcg+0x90/0x90
[2022-12-18 19:34:39] [    0.900489]  kthreadd+0x14b/0x1a0
[2022-12-18 19:34:39] [    0.900686]  ? kthread_is_per_cpu+0x30/0x30
[2022-12-18 19:34:39] [    0.900882]  ret_from_fork+0x35/0x40
[2022-12-18 19:34:39] [    0.901076] Modules linked in: serio_raw atkbd pata_acpi libps2 xen_blkfront(+) crc32c_intel ata_piix libata ehci_pci ehci_hcd scsi_mod i8042 floppy serio
[2022-12-18 19:34:39] [    1.043804] ---[ end trace 7d80e06b7a440a2d ]---
[2022-12-18 19:34:39] [    1.044020] RIP: 0010:kmem_cache_alloc_trace+0x84/0x200
[2022-12-18 19:34:39] [    1.044243] Code: 3b 76 49 83 78 10 00 4d 8b 38 0f 84 61 01 00 00 4d 85 ff 0f 84 58 01 00 00 41 8b 46 20 49 8b 9e 70 01 00 00 49 8b 3e 4c 01 f8 <48> 33 18 48 89 c1 4c 89 f8 48 0f c9 48 31 cb 48 8d 4a 01 65 48 0f
[2022-12-18 19:34:39] [    1.044923] RSP: 0018:ffffaef7001f3d88 EFLAGS: 00010202
[2022-12-18 19:34:39] [    1.045143] RAX: 1b6dd99358346dae RBX: e46fe56f475c6dae RCX: ffff97e0043bea10
[2022-12-18 19:34:39] [    1.045358] RDX: 0000000000000c14 RSI: 0000000000000d00 RDI: 0000000000034080
[2022-12-18 19:34:39] [    1.045568] RBP: 0000000000000d00 R08: ffff97e00adb4080 R09: 0000000000000000
[2022-12-18 19:34:39] [    1.045784] R10: 0000000000000001 R11: ffff97e00adb5170 R12: 0000000000000020
[2022-12-18 19:34:39] [    1.045991] R13: ffff97e009c03880 R14: ffff97e009c03880 R15: 1b6dd99358346dae
[2022-12-18 19:34:39] [    1.046212] FS:  0000000000000000(0000) GS:ffff97e00ad80000(0000) knlGS:0000000000000000
[2022-12-18 19:34:39] [    1.046440] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:34:39] [    1.046646] CR2: 00007f2a87e69010 CR3: 0000000205d16000 CR4: 0000000000740ee0
[2022-12-18 19:34:39] [    1.046857] PKRU: 55555554
[2022-12-18 19:34:39] [    1.188388] usb 1-1: new high-speed USB device number 2 using ehci-pci
[2022-12-18 19:34:39] [    1.358411] tsc: Refined TSC clocksource calibration: 4491.532 MHz
[2022-12-18 19:34:39] [    1.360135] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x40be298b2d9, max_idle_ns: 440795414753 ns
[2022-12-18 19:34:39] [    1.369153] clocksource: Switched to clocksource tsc
[2022-12-18 19:34:39] [    1.413328] usb 1-1: New USB device found, idVendor=0627, idProduct=0001, bcdDevice= 0.00
[2022-12-18 19:34:39] [    1.414811] usb 1-1: New USB device strings: Mfr=1, Product=3, SerialNumber=10
[2022-12-18 19:34:39] [    1.415046] usb 1-1: Product: QEMU USB Tablet
[2022-12-18 19:34:39] [    1.415265] usb 1-1: Manufacturer: QEMU
[2022-12-18 19:34:39] [    1.415483] usb 1-1: SerialNumber: 42
[2022-12-18 19:34:39] [    1.422701] general protection fault: 0000 [#3] SMP NOPTI
[2022-12-18 19:34:39] [    1.424461] CPU: 3 PID: 144 Comm: systemd-udevd Tainted: G      D           5.4.215-1-lts54 #1
[2022-12-18 19:34:39] [    1.424741] Hardware name: Xen HVM domU, BIOS 4.18-unstable 12/11/2022
[2022-12-18 19:34:39] [    1.425007] RIP: 0010:__kmalloc_track_caller+0x8e/0x230
[2022-12-18 19:34:39] [    1.425268] Code: 08 65 4c 03 05 8b fd 3a 76 49 83 78 10 00 4d 8b 38 0f 84 94 01 00 00 4d 85 ff 0f 84 8b 01 00 00 41 8b 46 20 49 8b 3e 4c 01 f8 <48> 8b 18 48 89 c1 49 33 9e 70 01 00 00 4c 89 f8 48 0f c9 48 31 cb
[2022-12-18 19:34:39] [    1.426092] RSP: 0018:ffffaef70021be18 EFLAGS: 00010202
[2022-12-18 19:34:39] [    1.426373] RAX: 1b6dd99358346dae RBX: 0000000000000cc0 RCX: 0000000000000000
[2022-12-18 19:34:39] [    1.426677] RDX: 0000000000000c14 RSI: 0000000000000cc0 RDI: 0000000000034080
[2022-12-18 19:34:39] [    1.426973] RBP: 0000000000000cc0 R08: ffff97e00adb4080 R09: ffffffff8ac42348
[2022-12-18 19:34:39] [    1.427267] R10: ffff97e0069a3000 R11: 0000000000000010 R12: 0000000000000013
[2022-12-18 19:34:39] [    1.427556] R13: ffff97e009c03880 R14: ffff97e009c03880 R15: 1b6dd99358346dae
[2022-12-18 19:34:39] [    1.427850] FS:  00007f2a8727b200(0000) GS:ffff97e00ad80000(0000) knlGS:0000000000000000
[2022-12-18 19:34:39] [    1.428146] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:34:39] [    1.428426] CR2: 00007f2a87e69010 CR3: 0000000205dc2000 CR4: 0000000000740ee0
[2022-12-18 19:34:39] [    1.428674] PKRU: 55555554
[2022-12-18 19:34:39] [    1.428924] Call Trace:
[2022-12-18 19:34:39] [    1.429165]  ? shmem_symlink+0xbd/0x280
[2022-12-18 19:34:39] [    1.429414]  kmemdup+0x17/0x40
[2022-12-18 19:34:39] [    1.429661]  shmem_symlink+0xbd/0x280
[2022-12-18 19:34:39] [    1.429913]  vfs_symlink+0xe1/0x170
[2022-12-18 19:34:39] [    1.430159]  do_symlinkat+0x120/0x140
[2022-12-18 19:34:39] [    1.430407]  do_syscall_64+0x49/0x90
[2022-12-18 19:34:39] [    1.430650]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[2022-12-18 19:34:39] [    1.430905] RIP: 0033:0x7f2a87c0584b
[2022-12-18 19:34:39] [    1.431137] Code: f0 ff ff 73 01 c3 48 8b 0d 3a f5 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 58 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d f5 0d 00 f7 d8 64 89 01 48
[2022-12-18 19:34:39] [    1.431910] RSP: 002b:00007ffd42b8bf58 EFLAGS: 00000246 ORIG_RAX: 0000000000000058
[2022-12-18 19:34:39] [    1.432176] RAX: ffffffffffffffda RBX: 0000560e56ef2b90 RCX: 00007f2a87c0584b
[2022-12-18 19:34:39] [    1.432449] RDX: 000000000000a000 RSI: 00007ffd42b8bf60 RDI: 0000560e56ef2bd0
[2022-12-18 19:34:39] [    1.432713] RBP: 00007ffd42b8c0b0 R08: 0000000000000009 R09: 0000000000000000
[2022-12-18 19:34:39] [    1.432985] R10: 0000000000000000 R11: 0000000000000246 R12: 0000560e56eed710
[2022-12-18 19:34:39] [    1.433249] R13: 0000000000000000 R14: 00007ffd42b8bf60 R15: 0000560e56ef2bd0
[2022-12-18 19:34:39] [    1.433474] Modules linked in: serio_raw atkbd pata_acpi libps2 xen_blkfront(+) crc32c_intel ata_piix libata ehci_pci ehci_hcd scsi_mod i8042 floppy serio
[2022-12-18 19:34:39] [    1.443915] ---[ end trace 7d80e06b7a440a2e ]---
[2022-12-18 19:34:39] [    1.444143] RIP: 0010:kmem_cache_alloc_trace+0x84/0x200
[2022-12-18 19:34:39] [    1.444371] Code: 3b 76 49 83 78 10 00 4d 8b 38 0f 84 61 01 00 00 4d 85 ff 0f 84 58 01 00 00 41 8b 46 20 49 8b 9e 70 01 00 00 49 8b 3e 4c 01 f8 <48> 33 18 48 89 c1 4c 89 f8 48 0f c9 48 31 cb 48 8d 4a 01 65 48 0f
[2022-12-18 19:34:39] [    1.445057] RSP: 0018:ffffaef7001f3d88 EFLAGS: 00010202
[2022-12-18 19:34:39] [    1.445281] RAX: 1b6dd99358346dae RBX: e46fe56f475c6dae RCX: ffff97e0043bea10
[2022-12-18 19:34:39] [    1.445511] RDX: 0000000000000c14 RSI: 0000000000000d00 RDI: 0000000000034080
[2022-12-18 19:34:39] [    1.445736] RBP: 0000000000000d00 R08: ffff97e00adb4080 R09: 0000000000000000
[2022-12-18 19:34:39] [    1.445963] R10: 0000000000000001 R11: ffff97e00adb5170 R12: 0000000000000020
[2022-12-18 19:34:39] [    1.446186] R13: ffff97e009c03880 R14: ffff97e009c03880 R15: 1b6dd99358346dae
[2022-12-18 19:34:39] [    1.446409] FS:  00007f2a8727b200(0000) GS:ffff97e00ad80000(0000) knlGS:0000000000000000
[2022-12-18 19:34:39] [    1.446641] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:34:39] [    1.446865] CR2: 00007f2a87e69010 CR3: 0000000205dc2000 CR4: 0000000000740ee0
[2022-12-18 19:34:39] [    1.447093] PKRU: 55555554
[2022-12-18 19:34:39] [    1.447362] BUG: unable to handle page fault for address: ffff97e803c4a008
[2022-12-18 19:34:39] [    1.447590] #PF: supervisor write access in kernel mode
[2022-12-18 19:34:39] [    1.447816] #PF: error_code(0x0002) - not-present page
[2022-12-18 19:34:39] [    1.448042] PGD 145001067 P4D 145001067 PUD 0 
[2022-12-18 19:34:39] [    1.448268] Oops: 0002 [#4] SMP NOPTI
[2022-12-18 19:34:39] [    1.448495] CPU: 3 PID: 144 Comm: systemd-udevd Tainted: G      D           5.4.215-1-lts54 #1
[2022-12-18 19:34:39] [    1.448738] Hardware name: Xen HVM domU, BIOS 4.18-unstable 12/11/2022
[2022-12-18 19:34:39] [    1.448979] RIP: 0010:__tlb_remove_page_size+0x12/0x80
[2022-12-18 19:34:39] [    1.449221] Code: 48 89 ef 5b 31 f6 5d e9 0c 13 01 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 48 8b 47 28 8b 50 08 8d 4a 01 89 48 08 <48> 89 74 d0 10 3b 48 0c 74 03 31 c0 c3 53 48 8b 47 28 48 89 fb 48
[2022-12-18 19:34:39] [    1.449972] RSP: 0018:ffffaef70021bcc8 EFLAGS: 00010206
[2022-12-18 19:34:39] [    1.450212] RAX: ffff97e003c4a000 RBX: ffff97e006b75c40 RCX: 0000000000000000
[2022-12-18 19:34:39] [    1.450462] RDX: 00000000ffffffff RSI: fffffaedc81b0c40 RDI: ffffaef70021be38
[2022-12-18 19:34:39] [    1.450711] RBP: 0000000206c31025 R08: ffff97e009552708 R09: 0000000000000000
[2022-12-18 19:34:39] [    1.450961] R10: 0000000000000001 R11: ffff97e00adb5170 R12: fffffaedc81b0c40
[2022-12-18 19:34:39] [    1.451202] R13: ffffaef70021be38 R14: 0000560e55989000 R15: 0000560e55988000
[2022-12-18 19:34:39] [    1.451442] FS:  0000000000000000(0000) GS:ffff97e00ad80000(0000) knlGS:0000000000000000
[2022-12-18 19:34:39] [    1.451695] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:34:39] [    1.451944] CR2: ffff97e803c4a008 CR3: 0000000205dc2000 CR4: 0000000000740ee0
[2022-12-18 19:34:39] [    1.452196] PKRU: 55555554
[2022-12-18 19:34:39] [    1.452429] Call Trace:
[2022-12-18 19:34:39] [    1.452660]  unmap_page_range+0x7d6/0xf50
[2022-12-18 19:34:39] [    1.452894]  ? oops_end+0xbd/0xc0
[2022-12-18 19:34:39] [    1.453120]  unmap_vmas+0x6e/0xd0
[2022-12-18 19:34:39] [    1.454157]  exit_mmap+0xa9/0x190
[2022-12-18 19:34:39] [    1.454694]  mmput+0x49/0x110
[2022-12-18 19:34:39] [    1.454911]  do_exit+0x2fa/0xa30
[2022-12-18 19:34:39] [    1.455119]  ? do_symlinkat+0x120/0x140
[2022-12-18 19:34:39] [    1.455326]  rewind_stack_do_exit+0x17/0x20
[2022-12-18 19:34:39] [    1.455535] RIP: 0033:0x7f2a87c0584b
[2022-12-18 19:34:39] [    1.455741] Code: f0 ff ff 73 01 c3 48 8b 0d 3a f5 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 58 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d f5 0d 00 f7 d8 64 89 01 48
[2022-12-18 19:34:39] [    1.456399] RSP: 002b:00007ffd42b8bf58 EFLAGS: 00000246 ORIG_RAX: 0000000000000058
[2022-12-18 19:34:39] [    1.456626] RAX: ffffffffffffffda RBX: 0000560e56ef2b90 RCX: 00007f2a87c0584b
[2022-12-18 19:34:39] [    1.644958] RDX: 000000000000a000 RSI: 00007ffd42b8bf60 RDI: 0000560e56ef2bd0
[2022-12-18 19:34:39] [    1.645184] RBP: 00007ffd42b8c0b0 R08: 0000000000000009 R09: 0000000000000000
[2022-12-18 19:34:39] [    1.646548] R10: 0000000000000000 R11: 0000000000000246 R12: 0000560e56eed710
[2022-12-18 19:34:39] [    1.646769] R13: 0000000000000000 R14: 00007ffd42b8bf60 R15: 0000560e56ef2bd0
[2022-12-18 19:34:39] [    1.646990] Modules linked in: serio_raw atkbd pata_acpi libps2 xen_blkfront(+) crc32c_intel ata_piix libata ehci_pci ehci_hcd scsi_mod i8042 floppy serio
[2022-12-18 19:34:39] [    1.647447] CR2: ffff97e803c4a008
[2022-12-18 19:34:39] [    1.647660] ---[ end trace 7d80e06b7a440a2f ]---
[2022-12-18 19:34:39] [    1.647878] RIP: 0010:kmem_cache_alloc_trace+0x84/0x200
[2022-12-18 19:34:39] [    1.648089] Code: 3b 76 49 83 78 10 00 4d 8b 38 0f 84 61 01 00 00 4d 85 ff 0f 84 58 01 00 00 41 8b 46 20 49 8b 9e 70 01 00 00 49 8b 3e 4c 01 f8 <48> 33 18 48 89 c1 4c 89 f8 48 0f c9 48 31 cb 48 8d 4a 01 65 48 0f
[2022-12-18 19:34:39] [    1.648745] RSP: 0018:ffffaef7001f3d88 EFLAGS: 00010202
[2022-12-18 19:34:39] [    1.648962] RAX: 1b6dd99358346dae RBX: e46fe56f475c6dae RCX: ffff97e0043bea10
[2022-12-18 19:34:39] [    1.649186] RDX: 0000000000000c14 RSI: 0000000000000d00 RDI: 0000000000034080
[2022-12-18 19:34:39] [    1.649406] RBP: 0000000000000d00 R08: ffff97e00adb4080 R09: 0000000000000000
[2022-12-18 19:34:39] [    1.649626] R10: 0000000000000001 R11: ffff97e00adb5170 R12: 0000000000000020
[2022-12-18 19:34:39] [    1.649842] R13: ffff97e009c03880 R14: ffff97e009c03880 R15: 1b6dd99358346dae
[2022-12-18 19:34:39] [    1.650056] FS:  0000000000000000(0000) GS:ffff97e00ad80000(0000) knlGS:0000000000000000
[2022-12-18 19:34:39] [    1.650276] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:34:39] [    1.650490] CR2: ffff97e803c4a008 CR3: 0000000205dc2000 CR4: 0000000000740ee0
[2022-12-18 19:34:39] [    1.650711] PKRU: 55555554
[2022-12-18 19:34:39] [    1.650920] Fixing recursive fault but reboot is needed!
[2022-12-18 19:34:52] [    5.828331] xenbus_probe_frontend: Waiting for devices to initialise: 25s...20s...
[2022-12-18 19:34:52] [   14.168465] random: crng init done
[2022-12-18 19:35:08] [   15.728335] 15s...10s...5s...0s...
[2022-12-18 19:36:51] Logfile Opened
[2022-12-18 19:36:54] :: running early hook [udev]
[2022-12-18 19:36:54] Starting version 251.5-1-arch
[2022-12-18 19:36:54] :: running hook [udev]
[2022-12-18 19:36:54] :: Triggering uevents...
[2022-12-18 19:36:55] [    0.812811] general protection fault: 0000 [#1] SMP NOPTI
[2022-12-18 19:36:55] [    0.812837] CPU: 1 PID: 105 Comm: xenwatch Not tainted 5.4.215-1-lts54 #1
[2022-12-18 19:36:55] [    0.812852] Hardware name: Xen HVM domU, BIOS 4.18-unstable 12/11/2022
[2022-12-18 19:36:55] [    0.812871] RIP: 0010:kmem_cache_alloc_trace+0x84/0x200
[2022-12-18 19:36:55] [    0.812882] Code: fb 44 49 83 78 10 00 4d 8b 38 0f 84 61 01 00 00 4d 85 ff 0f 84 58 01 00 00 41 8b 46 20 49 8b 9e 70 01 00 00 49 8b 3e 4c 01 f8 <48> 33 18 48 89 c1 4c 89 f8 48 0f c9 48 31 cb 48 8d 4a 01 65 48 0f
[2022-12-18 19:36:55] [    0.812921] RSP: 0018:ffffb555c01f3d88 EFLAGS: 00010282
[2022-12-18 19:36:55] [    0.812934] RAX: dbd3225111d050f8 RBX: 44c51e6d37b650f8 RCX: ffff99d9c43bd650
[2022-12-18 19:36:55] [    0.812950] RDX: 0000000000001bb6 RSI: 0000000000000d00 RDI: 0000000000034080
[2022-12-18 19:36:55] [    0.812966] RBP: 0000000000000d00 R08: ffff99d9cacb4080 R09: 0000000000000000
[2022-12-18 19:36:55] [    0.812983] R10: 0000000000000001 R11: ffff99d9cacb5170 R12: 0000000000000020
[2022-12-18 19:36:55] [    0.813000] R13: ffff99d9c9c03880 R14: ffff99d9c9c03880 R15: dbd3225111d050f8
[2022-12-18 19:36:55] [    0.813018] FS:  0000000000000000(0000) GS:ffff99d9cac80000(0000) knlGS:0000000000000000
[2022-12-18 19:36:55] [    0.813035] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:36:55] [    0.813048] CR2: 0000560861b85018 CR3: 0000000206344000 CR4: 0000000000740ee0
[2022-12-18 19:36:55] [    0.813066] PKRU: 55555554
[2022-12-18 19:36:55] [    0.813072] Call Trace:
[2022-12-18 19:36:55] [    0.813083]  ? blkfront_setup_indirect+0x138/0xdc0 [xen_blkfront]
[2022-12-18 19:36:55] [    0.813098]  blkfront_setup_indirect+0x138/0xdc0 [xen_blkfront]
[2022-12-18 19:36:55] [    0.813116]  ? count_strings+0x40/0x40
[2022-12-18 19:36:55] [    0.813125]  blkback_changed+0x302/0xe00 [xen_blkfront]
[2022-12-18 19:36:55] [    0.813136]  ? count_strings+0x40/0x40
[2022-12-18 19:36:55] [    0.813145]  xenwatch_thread+0x9a/0x160
[2022-12-18 19:36:55] [    0.813159]  ? wait_woken+0x80/0x80
[2022-12-18 19:36:55] [    0.813170]  kthread+0x10c/0x130
[2022-12-18 19:36:55] [    0.813178]  ? kthread_associate_blkcg+0x90/0x90
[2022-12-18 19:36:55] [    0.813192]  ret_from_fork+0x35/0x40
[2022-12-18 19:36:55] [    0.813201] Modules linked in: libps2 xen_blkfront(+) crc32c_intel ata_piix libata ehci_pci ehci_hcd scsi_mod i8042 floppy serio
[2022-12-18 19:36:55] [    0.813233] fbcon: Taking over console
[2022-12-18 19:36:55] [    0.813242] ---[ end trace ac6bf55eff6c768f ]---
[2022-12-18 19:36:55] [    0.813254] RIP: 0010:kmem_cache_alloc_trace+0x84/0x200
[2022-12-18 19:36:55] [    0.813265] Code: fb 44 49 83 78 10 00 4d 8b 38 0f 84 61 01 00 00 4d 85 ff 0f 84 58 01 00 00 41 8b 46 20 49 8b 9e 70 01 00 00 49 8b 3e 4c 01 f8 <48> 33 18 48 89 c1 4c 89 f8 48 0f c9 48 31 cb 48 8d 4a 01 65 48 0f
[2022-12-18 19:36:55] [    0.813304] RSP: 0018:ffffb555c01f3d88 EFLAGS: 00010282
[2022-12-18 19:36:55] [    0.813316] RAX: dbd3225111d050f8 RBX: 44c51e6d37b650f8 RCX: ffff99d9c43bd650
[2022-12-18 19:36:55] [    0.813333] RDX: 0000000000001bb6 RSI: 0000000000000d00 RDI: 0000000000034080
[2022-12-18 19:36:55] [    0.813349] RBP: 0000000000000d00 R08: ffff99d9cacb4080 R09: 0000000000000000
[2022-12-18 19:36:55] [    0.813367] R10: 0000000000000001 R11: ffff99d9cacb5170 R12: 0000000000000020
[2022-12-18 19:36:55] [    0.813383] R13: ffff99d9c9c03880 R14: ffff99d9c9c03880 R15: dbd3225111d050f8
[2022-12-18 19:36:55] [    0.813401] FS:  0000000000000000(0000) GS:ffff99d9cac80000(0000) knlGS:0000000000000000
[2022-12-18 19:36:55] [    0.813418] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:36:55] [    0.813432] CR2: 0000560861b85018 CR3: 0000000206344000 CR4: 0000000000740ee0
[2022-12-18 19:36:55] [    0.813448] PKRU: 55555554

It is directly related to the gpu passthrough (If do not do the PCI passthrough, the HVM start correctly) .

If I upgrade the kernel to a newer version, I can start the HVM but end up with the same kernel bug as with my old computer

So there is at least 2 differents issues.

One of the issue is a regression in the linux kernel related to PCI handling, the regression was introduced around 5.6.X. This should be the easiest bug to find since I can reduce the scope by upgrading to newer kernel until I find which specific version introduced the bug and then try to find it in the commit / source code. But I expect it to be very time consuming, again (in the beginning of the process could use the distribution archives to speed up by not needing to compile everything).
For the second issue, I have no idea at the moment. Something related to qemu version ? related to the linux kernel used to launch qemu ? a xen dependencie in the VM that is not of the correct version ? Lot of testing required to reduce the possibilities. (Try with gpu passthrough, without, with but without strict reset. Try all of the above but with non gpu PCI device. Try different kernel version (since it is directly related to the linux kernel version used ))

Update
For the second issue it feel like it is related to the xen_blkfront and xen_blkback drivers in the linux kernel. Maybe that a xen hypervisor version require guest to have some specific version of the linux kernel. Anyway, won’t focus on this issue.

For the first issue, kernel log indicate (on my zen4 computer, HVM kernel is 6.0.12):

[    1.755996] input: HDA ATI HDMI HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:07.0/sound/card0/input11
[    1.815044] input: QEMU QEMU USB Tablet as /devices/pci0000:00/0000:00:04.0/usb1/1-1/1-1:1.0/0003:0627:0001.0001/input/input12
[    1.815067] hid-generic 0003:0627:0001.0001: input,hidraw0: USB HID v0.01 Mouse [QEMU QEMU USB Tablet] on usb-0000:00:04.0-1/input0
[    1.815082] usbcore: registered new interface driver usbhid
[    1.815082] usbhid: USB HID core driver
[    2.018058] Console: switching to colour frame buffer device 128x48
[    2.041163] audit: type=1130 audit(1671393445.293:7): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubes-mount-dirs comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    2.221366] audit: type=1130 audit(1671393445.493:8): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journal-flush comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    2.275172] audit: type=1130 audit(1671393445.546:9): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dev-xvdc1-swap comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    2.275180] audit: type=1131 audit(1671393445.546:10): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dev-xvdc1-swap comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    2.275299] [drm] amdgpu kernel modesetting enabled.
[    2.276057] mousedev: PS/2 mouse device common for all mice
[    2.278349] amdgpu: CRAT table not found
[    2.278351] amdgpu: Virtual CRAT table created for CPU
[    2.278356] amdgpu: Topology: Add CPU node
[    2.278687] xen: --> pirq=24 -> irq=40 (gsi=40)
[    2.278790] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1043:0x0525 0xE7).
[    2.278793] [drm] register mmio base: 0xF2200000
[    2.278794] [drm] register mmio size: 262144
[    2.279710] [drm] add ip block number 0 <vi_common>
[    2.279712] [drm] add ip block number 1 <gmc_v8_0>
[    2.279712] [drm] add ip block number 2 <tonga_ih>
[    2.279713] [drm] add ip block number 3 <gfx_v8_0>
[    2.279714] [drm] add ip block number 4 <sdma_v3_0>
[    2.279715] [drm] add ip block number 5 <powerplay>
[    2.279716] [drm] add ip block number 6 <dm>
[    2.279716] [drm] add ip block number 7 <uvd_v6_0>
[    2.279717] [drm] add ip block number 8 <vce_v3_0>
[    2.452942] amdgpu 0000:00:06.0: amdgpu: Fetched VBIOS from ROM
[    2.452944] amdgpu: ATOM BIOS: 115-D009PI2-101
[    2.452958] [drm] UVD is enabled in VM mode
[    2.452961] [drm] UVD ENC is enabled in VM mode
[    2.452962] [drm] VCE enabled in VM mode
[    2.452963] amdgpu 0000:00:06.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    2.453357] [drm] GPU posting now...
[    2.559433] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[    2.564516] amdgpu 0000:00:06.0: amdgpu: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    2.564518] amdgpu 0000:00:06.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[    2.564525] [drm] Detected VRAM RAM=4096M, BAR=256M
[    2.564526] [drm] RAM width 256bits GDDR5
[    2.564534] [drm] amdgpu: 4096M of VRAM memory ready
[    2.564535] [drm] amdgpu: 3887M of GTT memory ready.
[    2.564547] [drm] GART: num cpu pages 65536, num gpu pages 65536
[    2.565709] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[    2.568500] [drm] Chained IB support enabled!
[    2.574044] amdgpu: hwmgr_sw_init smu backed is polaris10_smu
[    2.581882] [drm] Found UVD firmware Version: 1.130 Family ID: 16
[    2.613532] [drm] Found VCE firmware Version: 53.26 Binary ID: 3
[    3.154314] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input5
[    6.222318] amdgpu: SMU load firmware failed
[    6.223593] amdgpu: fw load failed
[    6.224559] amdgpu: smu firmware loading failed
[    6.224579] amdgpu 0000:00:06.0: amdgpu: amdgpu_device_ip_init failed
[    6.224597] amdgpu 0000:00:06.0: amdgpu: Fatal error during GPU init
[    6.224614] amdgpu 0000:00:06.0: amdgpu: amdgpu: finishing device.
[    6.226349] amdgpu: probe of 0000:00:06.0 failed with error -22
[    6.226358] BUG: kernel NULL pointer dereference, address: 0000000000000090
[    6.226372] #PF: supervisor write access in kernel mode
[    6.226382] #PF: error_code(0x0002) - not-present page
[    6.226391] PGD 0 P4D 0
[    6.226398] Oops: 0002 [#1] PREEMPT SMP NOPTI
[    6.226409] CPU: 2 PID: 315 Comm: systemd-udevd Not tainted 6.0.12-arch1-1 #1 c9932778529b16cae8b206cc5eba53043cd7ca6a
[    6.226425] Hardware name: Xen HVM domU, BIOS 4.18-unstable 12/11/2022
[    6.226436] RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
[    6.226449] Code: e2 12 d4 cf c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc cc cc 4c 8d 63 f0 4c 89 e7 e8 24 de 88 d0 48 8b 03 48 39 d8 74 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 95 de 88
[    6.226478] RSP: 0018:ffffb57f00783ac8 EFLAGS: 00010207
[    6.226488] RAX: 0000000000000000 RBX: ffff9800536896d0 RCX: ffff9800502ca5c0
[    6.226500] RDX: 0000000000000001 RSI: ffff9800502ca5e8 RDI: ffff9800536896c0
[    6.226512] RBP: ffff980053689628 R08: ffffffff91aead8d R09: 0000000000000010
[    6.226525] R10: 000000000000003a R11: ffff98005041eda0 R12: ffff9800536896c0
[    6.226538] R13: ffff980053689630 R14: ffff980053686208 R15: ffffb57f00783db0
[    6.226551] FS:  00007fe799e4c080(0000) GS:ffff98014af00000(0000) knlGS:0000000000000000
[    6.226564] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.226575] CR2: 0000000000000090 CR3: 0000000102318000 CR4: 0000000000750ee0
[    6.226588] PKRU: 55555554
[    6.226595] Call Trace:
[    6.226601]  <TASK>
[    6.226608]  amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu 71a5a223485e453556b42d4d63875cf5a0137241]
[    6.226746]  amdgpu_device_fini_sw+0x33/0x390 [amdgpu 71a5a223485e453556b42d4d63875cf5a0137241]
[    6.226843]  amdgpu_driver_release_kms+0x16/0x30 [amdgpu 71a5a223485e453556b42d4d63875cf5a0137241]
[    6.226934]  devm_drm_dev_init_release+0x49/0x70
[    6.226946]  release_nodes+0x40/0xb0
[    6.226955]  devres_release_all+0x8c/0xc0
[    6.226963]  device_unbind_cleanup+0xe/0x70
[    6.226974]  really_probe+0x242/0x380
[    6.226982]  ? pm_runtime_barrier+0x54/0x90
[    6.226991]  __driver_probe_device+0x78/0x170
[    6.227000]  driver_probe_device+0x1f/0x90
[    6.227008]  __driver_attach+0xd5/0x1d0
[    6.227015]  ? __device_attach_driver+0x110/0x110
[    6.227025]  bus_for_each_dev+0x8b/0xd0
[    6.227034]  bus_add_driver+0x1b2/0x200
[    6.227042]  driver_register+0x8d/0xe0
[    6.227050]  ? 0xffffffffc1330000
[    6.227059]  do_one_initcall+0x5d/0x220
[    6.227069]  do_init_module+0x4a/0x1e0
[    6.227078]  __do_sys_init_module+0x17f/0x1b0
[    6.227215]  do_syscall_64+0x5f/0x90
[    6.227342]  ? syscall_exit_to_user_mode+0x1b/0x40
[    6.227469]  ? do_syscall_64+0x6b/0x90
--

balko · December 18, 2022, 7:46pm

Thanks a lot for information, I’m just a bit overwhelmed with information about Ryzen on the forum (used Intel for Qubes OS for ages). But Ryzen due to its performance looks promising and tempting.
Will use Intel for some time more.
Thanks again, you work with Ryzen is very appreciated.

neowutran · December 18, 2022, 9:17pm

To test a lot of kernels quickly (my gaming hvm is a archlinux system): Index of /packages/l/linux/
mkinitcpio config file need to be modified to compress using gzip algorithm instead of zstd (old kernel doesn’t support it) mkinitcpio - ArchWiki

For bug n°2 some kernel information:
5.9.14 → Bug
5.10.1 → No bug

Something changed between those 2 linux kernel version, and xen hypervisor 4.17+ doesn’t like when the guest use a kernel without this unknown modification

For bug n°1, some kernel information:
5.4.X → Work
5.6.1 → Work
5.6.15 → Work
5.7.1 → Don’t Work
5.10.1 → Don’t work

Kernel log that work:


  [    0.000000] Linux version 5.4.215-1-lts54 (linux-lts54@archlinux) (gcc version 12.2.0 (GCC)) #1 SMP Sun, 02 Oct 2022 14:41:08 +0000
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-linux-lts54 root=/dev/xvda3 rw console=tty0 console=hvc0 swiotlb=8192 noresume clocksource=tsc xen_scrub_pages=0
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Hygon HygonGenuine
[    0.000000]   Centaur CentaurHauls
[    0.000000]   zhaoxin   Shanghai
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
--
[    0.015188]   4 disabled
[    0.015189]   5 disabled
[    0.015189]   6 disabled
[    0.015189]   7 disabled
[    0.015190] TOM2: 0000000840000000 aka 33792M
[    0.016127] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT
[    0.016346] last_pfn = 0xdffff max_arch_pfn = 0x400000000
[    0.018747] found SMP MP-table at [mem 0x000f5a40-0x000f5a4f]
[    0.018814] check: Scanning 1 areas for low memory corruption
[    0.018947] Using GB pages for direct mapping
[    0.019069] RAMDISK: [mem 0x36eb5000-0x37751fff]
[    0.019073] ACPI: Early table checksum verification disabled
[    0.019077] ACPI: RSDP 0x00000000000F5990 000024 (v02 Xen   )
[    0.019081] ACPI: XSDT 0x00000000FC00A660 000054 (v01 Xen    HVM      00000000 HVML 00000000)
[    0.019087] ACPI: FACP 0x00000000FC00A370 0000F4 (v04 Xen    HVM      00000000 HVML 00000000)
[    0.019092] ACPI: DSDT 0x00000000FC001040 0092A3 (v02 Xen    HVM      00000000 INTL 20190509)
[    0.019095] ACPI: FACS 0x00000000FC001000 000040
[    0.019097] ACPI: FACS 0x00000000FC001000 000040
[    0.019099] ACPI: APIC 0x00000000FC00A470 000080 (v02 Xen    HVM      00000000 HVML 00000000)
[    0.019101] ACPI: HPET 0x00000000FC00A570 000038 (v01 Xen    HVM      00000000 HVML 00000000)
[    0.019103] ACPI: WAET 0x00000000FC00A5B0 000028 (v01 Xen    HVM      00000000 HVML 00000000)
--
[    0.245774] Last level dTLB entries: 4KB 1536, 2MB 1536, 4MB 768, 1GB 0
[    0.245777] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[    0.245779] Spectre V2 : Mitigation: Retpolines
[    0.245779] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[    0.245781] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[    0.245782] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp
[    0.246069] Freeing SMP alternatives memory: 32K
[    0.247561] clocksource: xen: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.247564] Xen: using vcpuop timer interface
[    0.247571] installing Xen timer for CPU 0
[    0.247622] smpboot: CPU0: AMD Ryzen 7 1700 Eight-Core Processor (family: 0x17, model: 0x1, stepping: 0x1)
[    0.247648] cpu 0 spinlock event irq 53
[    0.247777] Performance Events: PMU not available due to virtualization, using software events only.
[    0.247816] rcu: Hierarchical SRCU implementation.
[    0.248226] NMI watchdog: Perf NMI watchdog permanently disabled
[    0.248311] smp: Bringing up secondary CPUs ...
[    0.248420] installing Xen timer for CPU 1
[    0.248475] x86: Booting SMP configuration:
[    0.248476] .... node  #0, CPUs:      #1
[    0.251352] cpu 1 spinlock event irq 59
[    0.251352] installing Xen timer for CPU 2
--
[    0.556711] fbcon: Deferring console take-over
[    0.556712] fb0: EFI VGA frame buffer device
[    0.556794] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[    0.556839] ACPI: Power Button [PWRF]
[    0.556884] input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input1
[    0.556900] ACPI: Sleep Button [SLPF]
[    0.569601] xen: --> pirq=22 -> irq=24 (gsi=24)
[    0.569976] xen:grant_table: Grant tables using version 1 layout
[    0.570025] Grant table initialized
[    0.571348] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    0.572361] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
[    0.572362] AMD-Vi: AMD IOMMUv2 functionality not available on this system
[    0.573609] usbcore: registered new interface driver usbserial_generic
[    0.573615] usbserial: USB Serial support registered for generic
[    0.574217] rtc_cmos 00:02: registered as rtc0
[    0.574237] rtc_cmos 00:02: alarms up to one day, 114 bytes nvram, hpet irqs
[    0.575898] ledtrig-cpu: registered to indicate activity on CPUs
[    0.575973] drop_monitor: Initializing network drop monitor service
[    0.576260] NET: Registered protocol family 10
[    0.584973] Segment Routing with IPv6
[    0.585010] NET: Registered protocol family 17
[    0.587102] RAS: Correctable Errors collector initialized.
--
[    2.902602] AES CTR mode by8 optimization enabled
[    2.930571] xen: --> pirq=51 -> irq=45 (gsi=45)
[    2.930858] snd_hda_intel 0000:00:07.0: Force to non-snoop mode
[    2.962935] input: HDA ATI HDMI HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:07.0/sound/card0/input7
[    2.966432] input: HDA ATI HDMI HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:07.0/sound/card0/input8
[    2.966477] input: HDA ATI HDMI HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:07.0/sound/card0/input9
[    2.966509] input: HDA ATI HDMI HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:07.0/sound/card0/input10
[    2.966539] input: HDA ATI HDMI HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:07.0/sound/card0/input11
[    2.966568] input: HDA ATI HDMI HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:07.0/sound/card0/input12
[    3.319131] Decoding supported only on Scalable MCA processors.
[    3.361707] [drm] amdgpu kernel modesetting enabled.
[    3.361868] CRAT table not found
[    3.361872] Virtual CRAT table created for CPU
[    3.361873] Parsing CRAT table with 1 nodes
[    3.361875] Creating topology SYSFS entries
[    3.361891] Topology: Add CPU node
[    3.361891] Finished initializing topology
[    3.361961] amdgpu 0000:00:06.0: remove_conflicting_pci_framebuffers: bar 0: 0xe0000000 -> 0xefffffff
[    3.361962] amdgpu 0000:00:06.0: remove_conflicting_pci_framebuffers: bar 2: 0xf2000000 -> 0xf21fffff
[    3.361963] amdgpu 0000:00:06.0: remove_conflicting_pci_framebuffers: bar 5: 0xf2200000 -> 0xf223ffff
[    3.363082] xen: --> pirq=50 -> irq=40 (gsi=40)
[    3.363947] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1043:0x0525 0xE7).
[    3.363957] [drm] register mmio base: 0xF2200000
[    3.363958] [drm] register mmio size: 262144
[    3.364313] [drm] add ip block number 0 <vi_common>
[    3.364315] [drm] add ip block number 1 <gmc_v8_0>
[    3.364315] [drm] add ip block number 2 <tonga_ih>
[    3.364316] [drm] add ip block number 3 <gfx_v8_0>
[    3.364317] [drm] add ip block number 4 <sdma_v3_0>
[    3.364318] [drm] add ip block number 5 <powerplay>
[    3.364319] [drm] add ip block number 6 <dm>
[    3.364320] [drm] add ip block number 7 <uvd_v6_0>
[    3.364321] [drm] add ip block number 8 <vce_v3_0>
[    3.366302] amdgpu 0000:00:06.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
[    3.496838] ATOM BIOS: 115-D009PI2-101
[    3.496882] [drm] UVD is enabled in VM mode
[    3.496883] [drm] UVD ENC is enabled in VM mode
[    3.496886] [drm] VCE enabled in VM mode
[    3.496909] [drm] GPU posting now...
[    3.633089] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[    3.640931] amdgpu 0000:00:06.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    3.640933] amdgpu 0000:00:06.0: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[    3.640941] [drm] Detected VRAM RAM=4096M, BAR=256M
[    3.640942] [drm] RAM width 256bits GDDR5
[    3.640976] [drm] amdgpu: 4096M of VRAM memory ready
[    3.640980] [drm] amdgpu: 4096M of GTT memory ready.
[    3.640998] [drm] GART: num cpu pages 65536, num gpu pages 65536
[    3.642682] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[    3.650499] [drm] Chained IB support enabled!
[    3.666488] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu
[    3.687069] [drm] Found UVD firmware Version: 1.130 Family ID: 16
[    3.697181] [drm] Found VCE firmware Version: 53.26 Binary ID: 3
[    3.764057] [drm] DM_PPLIB: values for Engine clock
[    3.764058] [drm] DM_PPLIB:	 300000
[    3.764059] [drm] DM_PPLIB:	 600000
[    3.764059] [drm] DM_PPLIB:	 900000
[    3.764060] [drm] DM_PPLIB:	 1162000
[    3.764060] [drm] DM_PPLIB:	 1233000
[    3.764060] [drm] DM_PPLIB:	 1275000
[    3.764061] [drm] DM_PPLIB:	 1319000
--
[    3.764062] [drm] DM_PPLIB:    level           : 8
[    3.764063] [drm] DM_PPLIB: values for Memory clock
[    3.764064] [drm] DM_PPLIB:	 300000
[    3.764064] [drm] DM_PPLIB:	 1000000
[    3.764064] [drm] DM_PPLIB:	 1750000
[    3.764065] [drm] DM_PPLIB: Validation clocks:
[    3.764065] [drm] DM_PPLIB:    engine_max_clock: 136000
[    3.764065] [drm] DM_PPLIB:    memory_max_clock: 175000
[    3.764066] [drm] DM_PPLIB:    level           : 8
[    3.764643] [drm] Display Core initialized with v3.2.48!
[    3.764760] snd_hda_intel 0000:00:07.0: bound 0000:00:06.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[    3.766124] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    3.766125] [drm] Driver supports precise vblank timestamp query.
[    3.792520] [drm] UVD and UVD ENC initialized successfully.
[    3.893472] [drm] VCE initialized successfully.
[    3.894823] kfd kfd: Allocated 3969056 bytes on gart
[    3.895542] Virtual CRAT table created for GPU
[    3.895543] Parsing CRAT table with 1 nodes
[    3.895550] Creating topology SYSFS entries
[    3.896148] Topology: Add dGPU node [0x67df:0x1002]
[    3.896154] kfd kfd: added device 1002:67df
[    3.896214] [drm] Cannot find any crtc or sizes
[    3.900312] [drm] Initialized amdgpu 3.35.0 20150101 for 0000:00:06.0 on minor 1
[    3.903678] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input6
[    3.935772] Decoding supported only on Scalable MCA processors.
[    4.075597] Decoding supported only on Scalable MCA processors.
[    4.165632] Decoding supported only on Scalable MCA processors.

Update
(Lot more testing required)
An interesting warning line I found during the installation of different kernel version.
On kernel that don’t work with GPU passthrough, this warning appear during the installation:
==> WARNING: Possibly missing firmware for module: xhci_pci
Lot more testing required, but that seems to be something interesting

update 2
On my old computer, updated all the dependencies of my linux gaming HVM.
Installed AUR (en) - mkinitcpio-firmware to get ride of all the warning.
Upgrade to the latest 5.4 kernel too.
Now gpu passthrough don’t work on any kernel version.
Error messages are:

[user@archlinux ~]$ sudo dmesg | grep -i amd
[    0.000000]   AMD AuthenticAMD
[    0.018236] RAMDISK: [mem 0x36de9000-0x376ebfff]
[    0.247365] smpboot: CPU0: AMD Ryzen 7 1700 Eight-Core Processor (family: 0x17, model: 0x1, stepping: 0x1)
[    3.245585] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
[    3.245586] AMD-Vi: AMD IOMMUv2 functionality not available on this system
[    5.315212] [drm] amdgpu kernel modesetting enabled.
[    5.315415] amdgpu 0000:00:07.0: remove_conflicting_pci_framebuffers: bar 0: 0xe0000000 -> 0xefffffff
[    5.315416] amdgpu 0000:00:07.0: remove_conflicting_pci_framebuffers: bar 2: 0xf2000000 -> 0xf21fffff
[    5.315417] amdgpu 0000:00:07.0: remove_conflicting_pci_framebuffers: bar 5: 0xf2200000 -> 0xf223ffff
[    5.318989] amdgpu 0000:00:07.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
[    5.321290] amdgpu 0000:00:07.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
[    5.322533] [drm:amdgpu_get_bios [amdgpu]] *ERROR* Unable to locate a BIOS ROM
[    5.322608] amdgpu 0000:00:07.0: Fatal error during GPU init
[    5.322645] [drm] amdgpu: finishing device.
[    5.322678] Modules linked in: snd_hda_intel(+) snd_intel_nhlt edac_mce_amd(-) snd_hda_codec crct10dif_pclmul crc32_pclmul joydev ghash_clmulni_intel pcc_cpufreq(-) acpi_cpufreq(-) fjes(-) amdgpu(+) snd_hda_core mousedev snd_hwdep snd_pcm gpu_sched bochs_drm i2c_algo_bit drm_vram_helper snd_timer ttm xen_netfront snd drm_kms_helper aesni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops crypto_simd soundcore psmouse cryptd glue_helper intel_agp pcspkr intel_gtt i2c_piix4 input_leds evdev mac_hid xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback drm xen_evtchn fuse agpgart dmi_sysfs ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid serio_raw ata_generic atkbd pata_acpi libps2 xen_blkfront crc32c_intel ata_piix libata scsi_mod ehci_pci ehci_hcd i8042 floppy serio
[    5.322810]  amdgpu_device_fini+0x3fe/0x432 [amdgpu]
[    5.322876]  amdgpu_driver_unload_kms+0x4a/0x90 [amdgpu]
[    5.322953]  amdgpu_driver_load_kms.cold+0x38/0x5a [amdgpu]
[    5.323029]  amdgpu_pci_probe+0xee/0x150 [amdgpu]
[    5.324552] amdgpu: probe of 0000:00:07.0 failed with error -22

T_T now time to downgrade things randomly until it break in a different way. Now at least I known that for this particular issue, it is not related to xen nor kernel

neowutran · December 19, 2022, 3:36pm

For bug n°1 the regression have been introduced in the linux kernel between 5.6.15 and 5.7.1

The relevant error message is:

[   94.121436] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[   94.246318] [drm] UVD and UVD ENC initialized successfully.
[   94.372278] [drm] VCE initialized successfully.
[   94.887716] [drm] Fence fallback timer expired on ring gfx
[   95.394399] [drm] Fence fallback timer expired on ring comp_1.0.0
[   95.901045] [drm] Fence fallback timer expired on ring comp_1.1.0
[   96.407728] [drm] Fence fallback timer expired on ring comp_1.2.0
[   96.914402] [drm] Fence fallback timer expired on ring comp_1.3.0
[   97.421058] [drm] Fence fallback timer expired on ring comp_1.0.1
[   97.927715] [drm] Fence fallback timer expired on ring comp_1.1.1
[   98.434388] [drm] Fence fallback timer expired on ring comp_1.2.1
[   98.941049] [drm] Fence fallback timer expired on ring comp_1.3.1
[   99.447716] [drm] Fence fallback timer expired on ring sdma0
[   99.954403] [drm] Fence fallback timer expired on ring sdma1
[  100.487716] [drm] Fence fallback timer expired on ring uvd
[  100.994409] [drm] Fence fallback timer expired on ring uvd_enc0
[  101.501053] [drm] Fence fallback timer expired on ring uvd_enc1
[  102.114417] [drm] Fence fallback timer expired on ring vce0
[  102.647730] [drm] Fence fallback timer expired on ring sdma0
[  112.781070] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out
[  113.367734] [drm] Fence fallback timer expired on ring sdma0
[  114.461066] [drm] Fence fallback timer expired on ring sdma0

Found an issue that could maybe be related: System unusable 1 out of 3 boots: Fence fallback timer expired on ring sdma0 or gfx (#1381) · Issues · drm / amd · GitLab

neowutran · December 20, 2022, 6:22pm

Compiling old kernel is quite annoying. It doesn’t work on archlinux or any system with recent package.
So created a debian-11 qubes, then compiled and installed pacman from source.
Then cloned this aur package AUR (en) - linux-lts54 and adjusted the PKGBUILD file to compile the kernel version I want.
Starting to compile kernel 5.6.19.
The goal is to find the exact kernel version that introduced the bug, then switch to git bisect to find the exact problematic commit.

Going to take many many hours. Going to update this post when I start to have interesting result with my compilation

5.6.19: Work
5.7: Don’t work
5.7-rc1: Not bootable

neowutran · December 21, 2022, 7:05pm

Compiled from tarball:
5.6.19: Work
5.7: Don’t work

Compiled from git:
5.7-rc1: Not bootable
5.7-rc7: Don’t work
5.7-rc2: Not bootable
5-7-rc4: Not bootable
5-7-rc5: Not bootable

Not bootable means: Stackoverflow in the kernel directly. Probably related with xen things.
With my luck, the commit I am looking for is inside the range of release candidate I can’t use due to stackoverflow. So would first find which commit fix the kernel stackoverflow, then backport it to continue searching more my GPU passthrough issue.
And if murphy decide to be extra mean, both commit are related, just for extra suffering for debugging.
I hate computer.

fjdh · December 22, 2022, 10:17am

there’s a whole bunch of kernels that can still be downloaded from Index of /r4.1/current-testing/dom0/fc32/rpm/ if that helps, just search for latest-

enmus · December 22, 2022, 4:25pm

Completely, shamelessly naked offtopic

Hahaha. Twenty years ago, in a similar situation I stopped to actively work with computers for living, “realizing” that computers are the greatest hoax of the 20th century.

neowutran · December 22, 2022, 5:31pm

Thanks @fjdh , but in that case I am looking to find the commit that changed the behavior regarding the GPU passthrough, I know it is between 5.6.19 and 5.7-rc5, so I need to search between that.

@enmus ahah, I am not at this point yet, but I do understand. This xkcd is great: xkcd: Shouldn't Be Hard
For the kernel stackoverflow, after reading some commits, it seems it is a bug related to multi cpu support (vcpu in my case), configuring my hvm to use only 1 vcpu seems to be a valid workaround.

5.7-rc5: Don’t work
5.7-rc1: Don’t work

So the regression have been introduced with the rc1. Now the bisect can start

1cd377baa91844b9f87a2b72eabf7ff783946b5e: Different error, related to graphics ( xorg refuse to start but that is not the root issue. Can execute command inside the VM, the error message doesn’t show. But can’t launch anything related to X, qubes daemon doesn’t work)

2bcb4fd6ba9152c699d873ffa4593d5a4fe1f8d4: Work
0e1b4271078787d3408d3dd314d80b290578cc00: Work
9b06860d7c1f1f4cb7d70f92e47dfa4a91bd5007: Don’t work
So the bug have been introduced between 08 april 2020 and 09 april 2020

aa317d3351dee7cb0b27db808af0cd2340dcbaef: Work
9bb50ed7470944238ec8e30a94ef096caf9056ee: Don’t work
8 commits left

From the remaining commits, the bug is not related to the gpu driver but probably related to virtualization.
Many hours of compilation still required. But this line look interesting Merge tag 'iommu-updates-v5.7' of git://git.kernel.org/pub/scm/linux/… · torvalds/linux@0906d8b · GitHub

neowutran · December 23, 2022, 8:40pm

0339eb95403fb4664219be344a9399a3fdf1fae1: don’t work

So only 2 possible commit. The most probable one seems this rewrite of vdpa Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/gi… · torvalds/linux@9bb7152 · GitHub
Will confirm tomorrow. But if it is this commit, it is going to be hard to find the fix without understanding how vdpa work.
This line could be interesting, noting for later: Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/gi… · torvalds/linux@9bb7152 · GitHub

balko · December 24, 2022, 6:20am

You probably can write to the commit author, maybe they will be willing to help.

neowutran · December 24, 2022, 8:14am

Good idea, will do that once I found the problematic commit.

( I did some shit on how I tried to bisect, so going to take more time to identify the faulty commit)

9bb715260ed4cef6948cb2e05cf670462367da71: don’t work
34183ddd13dbfa859c4b68d16a30aad2cce72b11: don’t work
5c8db3eb381745c010ba746373a279e92502bdc8: don’t work
4646de87d32526ee87b46c2e0130413367fb5362: work
f14a9532ee30c68a56ff502c382860f674cc180c: don’t work

neowutran · December 27, 2022, 11:18pm

I am having a lot of difficulties with the bisect process.
The issue being that there is a big range of commits that result is stackoverflow or other unrecoverable kernel error.

However the scope of potentials commits is greatly reduced.
With the latest tests:

6afe6929964bca6847986d0507a555a041f07753: don’t work
ff36e78fdb251b9fa65028554689806961e011eb: work
05f3a6f5e478f622f548314471382df5b0f9dbf8: work
83794ee6c13b41c7db86ccfcaa20dc360b08fdb6: don’t work

from the git history it have a big probabilty of being this commit https://github.com/torvalds/linux/commit/83794ee6c13b41c7db86ccfcaa20dc360b08fdb6

Will try to ask for help in the drm issue.
And also try to compile kernel 6.1 with some part of this commit removed.

neowutran · December 29, 2022, 8:02am

git bisect visualize is a nice tool.
Got a bit lost with merge that add a lots of commits in the past.
Helped me to understand what is going on

4825b61a3d39eceef7db723808103aa60fc24520: work
a2ae604da74dcf9ae674d3c03efad80574952800: don’t work

neowutran · December 30, 2022, 10:58pm

I confirm that the issue is this commit

It is a merge of many commits, and due to some git magic that I do not understand, I am unable to bisect inside this merge.
So instead I am trying to cherry picking all of the commits of this merge and recompiling after few cherry-pick, testing, etc.

There is definitely a lot of thing I do not understand on how git work. And really don’t understand why “git bisect” doesn’t give acceptable result

cherry picking until db70e2c13983926d8d657db3e740264b75ad20a4 : work

cherry picking until c16904b0f305c5f6bc31de118d4b1e60a5da5408: don’t work

The specific problematic commit of the merge seems to be 4fdda2e66de0b7d37aa27af3c1bbe25ecb2d5408

The problematic patch is

@@ -170,10 +170,16 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
 	}
 
 	if (amdgpu_device_supports_boco(dev) &&
-	    (amdgpu_runtime_pm != 0)) /* enable runpm by default */
+	    (amdgpu_runtime_pm != 0)) /* enable runpm by default for boco */
 		adev->runpm = true;
 	else if (amdgpu_device_supports_baco(dev) &&
-		 (amdgpu_runtime_pm > 0)) /* enable runpm if runpm=1 */
+		 (amdgpu_runtime_pm != 0) &&
+		 (adev->asic_type >= CHIP_TOPAZ) &&
+		 (adev->asic_type != CHIP_VEGA20) &&
+		 (adev->asic_type != CHIP_ARCTURUS)) /* enable runpm on VI+ */
+		adev->runpm = true;
+	else if (amdgpu_device_supports_baco(dev) &&
+		 (amdgpu_runtime_pm > 0))  /* enable runpm if runpm=1 on CI */
 		adev->runpm = true;
 
 	/* Call ACPI methods: require modeset init

To fix my issue, I modified the integer comparison from
amdgpu_runtime_pm != 0 back to amdgpu_runtime_pm > 0

(reported to the amd driver issue tracker)

Now trying to apply this modification to the 6.1 kernel

neowutran · December 31, 2022, 5:57pm

This fix is enough for some kernel version.
Tested with kernel 5.13, all good
With 6.1, too many things have changed, for it to work.

Let’s try to bisect when/where this fix stop being enough.

First tries:
ccd1950c2f7e38ae45aeefb99a08b39407cd6c63: bad
5745d647d5563d3e9d32013ad4e5c629acff04d7: bad
2ba047855096fff551402a87272b520fe97323f5: bad

neowutran · January 3, 2023, 5:44pm

It seems that setting amdgpu.runpm=0 and also setting pci=nomsi fix the issue
related link:

It seems it is a xen bug related to missing MSI

probably related ? (definitely need to be tested) :

anyway with this config it seems I can use my RX 580 with kernel 6.1 on my old computer.
Now let’s try to transfert the VM to my new computer

neowutran · January 4, 2023, 5:43pm

Transfered the VM: It does NOT work.

Error message is this one: Ryzen 7000 serie - #41 by neowutran

The VM is identical from the one on my old computer (Used qubes backup/restore to transfert it)
The GPU is identical from the one on my old computer (Unplugged it from the old to plug it in the new)

Sooo, the only difference I see are:

Not the same motherboard / cpu. Can’t do anything about that
Not the same xen version ( On the old computer I am using the standard 4.14 xen version, on the new I am using the master branch (4.18 unstable)
???

So guess I am back at compiling qubes iso with different xen version to check if it make a difference.
Or if anyone have a better idea

neowutran · January 6, 2023, 1:30pm

Created a Qubes R4.2 ISO with all the needed patches.
Installed it on both my old computer and new computer.
Configured everything the same way.
Used the exact same GPU.

Old computer: it work
New computer: it don’t work (same error as before).

So I ruled out all the drivers and software issues. What is left is hardware issue on my new computer (eveything except the GPU. The issue is most likely related to the motherboard).
Guess I am only left with:

Randomly modifying the bios configuration (highly unlikely to change anything)
Be a bit more annoying with the ASUS support so that they fix their shitty hardware
RMA the motherboard and use another brand

Or if anyone have any idea, I am taking anything

Update: Got an answer from asus, basically, they escalated to AMD, they know there is a bug in the linux kernel, and apparently they got the middle finger when talking about a fix. So it won’t be fixed.
Maybe they will do some effort when they will release their server cpu, but either I find a workaround myself, or resell the computer, or transform it to a compilation machine (since it have proven to be quite impressive with compilation)

remedy · January 6, 2023, 3:37pm

Sorry if you already mentioned trying this somewhere:
Have you tried disabling resizable bar in bios?

And also patched stubdom-linux-rootfs.gz per Contents/windows-gaming-hvm.md at master · Qubes-Community/Contents · GitHub