Ryzen 7000 serie

Now back to the original issue of PCI passthrough, upgrading stubdom dependencies wasn’t the solution.

I can change the crash error message by disabling or enabling this patch qubes-vmm-xen-stubdom-linux/0008-xen-fix-stubdom-PCI-addr.patch at master · QubesOS/qubes-vmm-xen-stubdom-linux · GitHub .
Without this patch the error when trying to do pci passthrough is “could not open '/sys/bus/pci/devices/0000:01:00.0/config: No such file or directory”. Seems like the patch is designed to avoid this specific error.

When enabling this patch it crash with the error I previously posted “Domain 4:Offset 0x000e:0x49090000 expands past register size (1)”, “xen_pt_config_reg_init: Offset 0x000e mismatch! Emulated=0x0080, host=0x49090000, syncing to 0x49090000”.

Still no idea of what is the solution to fix this issue, but “what is the issue” seems a bit clearer to me.

From the log a difference seems to appear between standard qubes & new xen.
The flag “PCI_BASE_ADDRESS_MEM_TYPE_64 0x04 " seems to be used. ( I see the type 0x04 in my custom build while on standard qubes os it seems to use PCI_BASE_ADDRESS_MEM_TYPE_32”. To be confirmed. Still no idea on what it means for the fix I need to do.

Update: This specific issue is fixed, I made some mistakes when upgrading the rpm spec for qubes-vmm-xen. Pci passthrough still doesn’t work, but it crash a bit later in the initialization steps. Speaking about “rdm check flag”, will try to learn what it is

1 Like

Major update:
The libvirt error message was a bit misleading.
However the xen error message was quite explicit and directly suggested me to try to set the “permissive” attribute.

I posted this message from my custom qubes build, with xen 4.16.2, libvirt 8.9.0, qemu 7.1.

( A lot more work is still required: testing, a lot of testing. Cleaning the code, trying to reduce the size of the diff between my fork and the official qubes os. Rewriting the git commit history (don’t look at it, it was my try&die workflow ), and many other thing. But now I am certain that I will make it work as I want).

2 Likes

builder.conf:

# vim: ft=make ts=4 sw=4

# Ready to use config for full build of the latest development version Qubes OS (aka "master").

GIT_BASEURL ?= https://github.com
GIT_PREFIX ?= QubesOS/qubes-
NO_SIGN ?= 1
#BRANCH ?= release4.1

BACKEND_VMM=xen

DIST_DOM0 ?= fc37
DISTS_VM ?= fc36

VERBOSE ?= 1
DEBUG ?= 1
#DISTS_VM ?= bullseye fc36

MGMT_COMPONENTS = \
	mgmt-salt \
	mgmt-salt-base \
	mgmt-salt-base-topd \
	mgmt-salt-base-config \
	mgmt-salt-dom0-qvm \
	mgmt-salt-dom0-virtual-machines \
	mgmt-salt-dom0-update

COMPONENTS ?= \
    vmm-xen \
    core-libvirt \
    core-vchan-xen \
    core-qubesdb \
    core-qrexec \
    linux-utils \
    python-cffi \
    python-xcffib \
    python-hid \
    python-u2flib-host \
    python-qasync \
    python-panflute \
    rpm-oxide \
    core-admin \
    core-admin-client \
    core-admin-addon-whonix \
    core-admin-linux \
    core-agent-linux \
    intel-microcode \
    linux-firmware \
    linux-kernel \
    artwork \
    grub2-theme \
    gui-common \
    gui-daemon \
    gui-agent-linux \
    gui-agent-xen-hvm-stubdom \
    app-linux-split-gpg \
    app-thunderbird \
    app-linux-pdf-converter \
    app-linux-img-converter \
    app-linux-input-proxy \
    app-linux-usb-proxy \
    app-linux-snapd-helper \
    app-shutdown-idle \
    app-yubikey \
    app-u2f \
    screenshot-helper \
    $(MGMT_COMPONENTS) \
    infrastructure \
    repo-templates \
    meta-packages \
	pykickstart \
	vmm-xen-stubdom-linux \
    manager \
    desktop-linux-common \
    desktop-linux-kde \
    desktop-linux-xfce4 \
    desktop-linux-xfce4-xfwm4 \
    desktop-linux-i3 \
    desktop-linux-i3-settings-qubes \
    desktop-linux-awesome \
    desktop-linux-manager \
    grubby-dummy \
    dummy-psu \
    dummy-backlight \
    linux-gbulb \
    linux-scrypt \
    xdotool \
    linux-template-builder \
    installer-qubes-os \
    qubes-release \
    blivet \
    lorax \
    lorax-templates \
    anaconda \
    anaconda-addon \
    linux-yum \
    linux-deb \
    tpm-extra \
    trousers-changer \
    antievilmaid \
    xscreensaver \
    remote-support \
    builder \
    builder-debian \
    builder-rpm

#python-objgraph
#grub2
# vmm-xen-stubdom-legacy
# seabios
# linux-pvgrub2
# lvm2 
# efitools 
# tpm2-tss 
# tpm2-tools 
# sbsigntool
# windows-tools-cross 
#
#
# alsa-lib 
# alsa-utils 
# alsa-sof-firmware 
# xorg-x11-drv-intel 
# xorg-x11-drv-amdgpu 

BUILDER_PLUGINS = builder-rpm
#BUILDER_PLUGINS = builder-rpm builder-debian
BUILDER_PLUGINS += mgmt-salt

WINDOWS_COMPONENTS = \
                     vmm-xen-windows-pvdrivers \
                     windows-utils \
                     core-agent-windows \
                     gui-agent-windows \
                     installer-qubes-os-windows-tools \
                     builder-windows

# Uncomment this to enable windows tools build
#DISTS_VM += win7x64
#COMPONENTS += $(WINDOWS_COMPONENTS)
#BUILDER_PLUGINS += builder-windows


INSECURE_SKIP_CHECKING = linux-kernel vmm-xen core-libvirt core-qrexec vmm-xen-stubdom-linux anaconda installer-qubes-os qubes-release meta-packages core-admin lorax lorax-templates blivet linux-firmware pykickstart core-admin-linux core-vchan-xen anaconda-addon mgmt-salt-dom0-qvm mgmt-salt-base-topd mgmt-salt-base manager gui-agent-xen-hvm-stubdom


GIT_URL_gui_agent_xen_hvm_stubdom = https://github.com/neowutran/qubes-gui-agent-xen-hvm-stubdom.git
BRANCH_gui_agent_xen_hvm_stubdom = master

GIT_URL_manager = https://github.com/neowutran/qubes-manager.git
BRANCH_manager = master

GIT_URL_mgmt_salt_dom0_qvm = https://github.com/neowutran/qubes-mgmt-salt-dom0-qvm.git
BRANCH_mgmt_salt_dom0_qvm = master
GIT_URL_mgmt_salt_base_topd = https://github.com/neowutran/qubes-mgmt-salt-base-topd.git
BRANCH_mgmt_salt_base_topd = master
GIT_URL_mgmt_salt_base = https://github.com/neowutran/qubes-mgmt-salt-base.git
BRANCH_mgmt_salt_base = master














GIT_URL_core_vchan_xen = https://github.com/neowutran/qubes-core-vchan-xen.git
BRANCH_core_vchan_xen = master

GIT_URL_core_admin_linux = https://github.com/neowutran/qubes-core-admin-linux.git
BRANCH_core_admin_linux = master

GIT_URL_blivet = https://github.com/neowutran/qubes-blivet.git
BRANCH_blivet = master

GIT_URL_pykickstart = https://github.com/neowutran/qubes-pykickstart.git
BRANCH_pykickstart = master

GIT_URL_lorax = https://github.com/neowutran/qubes-lorax.git
BRANCH_lorax = master

GIT_URL_lorax_templates = https://github.com/neowutran/qubes-lorax-templates.git
BRANCH_lorax_templates = master

GIT_URL_installer_qubes_os = https://github.com/neowutran/qubes-installer-qubes-os.git
BRANCH_installer_qubes_os = master

GIT_URL_core_admin = https://github.com/neowutran/qubes-core-admin.git
BRANCH_core_admin = master

GIT_URL_qubes_release = https://github.com/neowutran/qubes-qubes-release.git
BRANCH_qubes_release = master

GIT_URL_meta_packages = https://github.com/neowutran/qubes-meta-packages.git
BRANCH_meta_packages = master

GIT_URL_vmm_xen_stubdom_linux = https://github.com/neowutran/qubes-vmm-xen-stubdom-linux.git
BRANCH_vmm_xen_stubdom_linux = master
#BRANCH_vmm_xen_stubdom_linux = alternative_try

GIT_URL_anaconda = https://github.com/neowutran/qubes-anaconda.git
BRANCH_anaconda = master

GIT_URL_anaconda_addon = https://github.com/neowutran/qubes-anaconda-addon.git
BRANCH_anaconda_addon = master

GIT_URL_core_qrexec = https://github.com/neowutran/qubes-core-qrexec.git
BRANCH_core_qrexec = master

GIT_URL_core_libvirt = https://github.com/neowutran/qubes-core-libvirt.git
BRANCH_core_libvirt = master

GIT_URL_vmm_xen = https://github.com/neowutran/qubes-vmm-xen.git
BRANCH_vmm_xen = xen-4.14

#INSECURE_SKIP_CHECKING = linux-kernel
GIT_URL_linux_kernel = https://github.com/neowutran/qubes-linux-kernel.git
BRANCH_linux_kernel = master

#GIT_URL_linux_firmware = https://github.com/neowutran/qubes-linux-firmware.git
#BRANCH_linux_firmware = master

BRANCH_linux_template_builder = master
BRANCH_linux_yum = master
BRANCH_linux_deb = master
BRANCH_app_linux_split_gpg = master
BRANCH_app_linux_tor = master
BRANCH_app_thunderbird = master
BRANCH_app_linux_pdf_converter = master
BRANCH_app_linux_img_converter = master
BRANCH_app_linux_input_proxy = master
BRANCH_app_linux_usb_proxy = master
BRANCH_app_linux_snapd_helper = master
BRANCH_app_shutdown_idle = master
BRANCH_app_yubikey = master
BRANCH_app_u2f = master
BRANCH_builder = master
BRANCH_builder_rpm = master
BRANCH_builder_debian = master
BRANCH_builder_archlinux = master
BRANCH_builder_github = master
BRANCH_builder_windows = master
BRANCH_infrastructure = master
BRANCH_template_whonix = master
BRANCH_template_kali = master
BRANCH_grubby_dummy = master
BRANCH_xorg_x11_drv_intel = master
BRANCH_linux_pvgrub2 = master
BRANCH_linux_scrypt = master
BRANCH_linux_gbulb = master
BRANCH_python_cffi = master
BRANCH_python_xcffib = master
BRANCH_python_quamash = master
BRANCH_python_objgraph = master
BRANCH_python_hid = master
BRANCH_python_u2flib_host = master
BRANCH_python_qasync = master
BRANCH_python_panflute = master
BRANCH_intel_microcode = master
BRANCH_xdotool = master

BRANCH_rpm_oxide = main

BRANCH_alsa_lib = main
BRANCH_alsa_utils = main
BRANCH_alsa_sof_firmware = main

BRANCH_efitools = main
BRANCH_sbsigntools = main
BRANCH_tpm2_tss = main
BRANCH_tpm2_tools = main

TEMPLATE_ROOT_WITH_PARTITIONS = 1

TEMPLATE_LABEL ?=
# Fedora
TEMPLATE_LABEL += fc34:fedora-34
TEMPLATE_LABEL += fc35:fedora-35
TEMPLATE_LABEL += fc36:fedora-36
TEMPLATE_LABEL += fc34+minimal:fedora-34-minimal
TEMPLATE_LABEL += fc35+minimal:fedora-35-minimal
TEMPLATE_LABEL += fc36+minimal:fedora-36-minimal
TEMPLATE_LABEL += fc34+xfce:fedora-34-xfce
TEMPLATE_LABEL += fc35+xfce:fedora-35-xfce
TEMPLATE_LABEL += fc36+xfce:fedora-36-xfce

# Debian
TEMPLATE_LABEL += stretch:debian-9
TEMPLATE_LABEL += stretch+standard:debian-9
TEMPLATE_LABEL += stretch+xfce:debian-9-xfce
TEMPLATE_LABEL += buster:debian-10
TEMPLATE_LABEL += buster+standard:debian-10
TEMPLATE_LABEL += buster+xfce:debian-10-xfce
TEMPLATE_LABEL += bullseye:debian-11
TEMPLATE_LABEL += bullseye+standard+firmware:debian-11
TEMPLATE_LABEL += bullseye+xfce:debian-11-xfce
TEMPLATE_LABEL += bookworm:debian-12
TEMPLATE_LABEL += bookworm+standard:debian-12
TEMPLATE_LABEL += bookworm+xfce:debian-12-xfce

# Ubuntu
TEMPLATE_LABEL += bionic+standard:bionic
TEMPLATE_LABEL += focal+standard:focal

# Whonix
TEMPLATE_LABEL += buster+whonix-gateway+minimal+no-recommends:whonix-gw-15
TEMPLATE_LABEL += buster+whonix-workstation+minimal+no-recommends:whonix-ws-15
TEMPLATE_LABEL += bullseye+whonix-gateway+minimal+no-recommends:whonix-gw-16
TEMPLATE_LABEL += bullseye+whonix-workstation+minimal+no-recommends:whonix-ws-16

# CentOS
TEMPLATE_LABEL += centos7:centos-7
TEMPLATE_LABEL += centos7+minimal:centos-7-minimal
TEMPLATE_LABEL += centos7+xfce:centos-7-xfce
TEMPLATE_LABEL += centos-stream8:centos-stream-8
TEMPLATE_LABEL += centos-stream8+minimal:centos-stream-8-minimal
TEMPLATE_LABEL += centos-stream8+xfce:centos-stream-8-xfce

TEMPLATE_ALIAS ?=
# Debian
TEMPLATE_ALIAS += stretch:stretch+standard
TEMPLATE_ALIAS += stretch+gnome:stretch+gnome+standard
TEMPLATE_ALIAS += stretch+minimal:stretch+minimal+no-recommends
TEMPLATE_ALIAS += buster:buster+standard
TEMPLATE_ALIAS += buster+gnome:buster+gnome+standard
TEMPLATE_ALIAS += buster+minimal:buster+minimal+no-recommends
TEMPLATE_ALIAS += bullseye:bullseye+standard+firmware
TEMPLATE_ALIAS += bullseye+gnome:bullseye+gnome+standard+firmware
TEMPLATE_ALIAS += bullseye+minimal:bullseye+minimal+no-recommends
TEMPLATE_ALIAS += bookworm:bookworm+standard
TEMPLATE_ALIAS += bookworm+gnome:bookworm+gnome+standard
TEMPLATE_ALIAS += bookworm+minimal:bookworm+minimal+no-recommends

# Ubuntu
TEMPLATE_ALIAS += bionic:bionic+standard
TEMPLATE_ALIAS += focal:focal+standard

# Whonix
TEMPLATE_ALIAS += whonix-gateway-15:buster+whonix-gateway+minimal+no-recommends
TEMPLATE_ALIAS += whonix-workstation-15:buster+whonix-workstation+minimal+no-recommends
TEMPLATE_ALIAS += whonix-gateway-16:bullseye+whonix-gateway+minimal+no-recommends
TEMPLATE_ALIAS += whonix-workstation-16:bullseye+whonix-workstation+minimal+no-recommends


# Uncomment this lines to enable CentOS template build
#DISTS_VM += centos-stream8

# Uncomment this lines to enable Whonix template build
#DISTS_VM += whonix-gateway whonix-workstation
#COMPONENTS += template-whonix
#BUILDER_PLUGINS += template-whonix

# Uncomment this lines to enable Debian 9 template build
#DISTS_VM += stretch
#COMPONENTS += builder-debian
#BUILDER_PLUGINS += builder-debian

# Uncomment this line to enable Archlinux template build
#DISTS_VM += archlinux
#COMPONENTS += builder-archlinux
#BUILDER_PLUGINS += builder-archlinux

about::
	@echo "qubes-os-r4.1.conf"

build instruction: just the standard get-sources + qubes + iso

When installing, at the first boot, the anaconda addons will crash.
Need to issue the needed command manually qubes-anaconda-addon/qubes.py at master · QubesOS/qubes-anaconda-addon · GitHub

https://neowutran.ovh/qubes_xen4.16_v2.iso
md5sum 39b23367269631044c8439c94bd4bdae

( only for dev & testing ofc)

4 Likes

Wow, I hope a maintainer sees this and we can get these changes pushed in a officially supported iso.

Marmarek recently submitted a PR to QubesOS/qubes-vmm-xen at github. The PR upgrades Xen version to 4.17-rc3, which I think is what next release of QubesOS will rely on.

1 Like

Interesting, is there a test .iso with the new version of XEN yet? I understand that Qubes often has ISO’s under testing that can be downloaded.

Some update on my progress.

I was also able to build another ISO using builderv2 and only using official qubes repo + the marmarek repositories mentionned in the issue.

However I still have the same issue regarding the TSC clocksource Ryzen 7000 serie - #19 by neowutran .

On my asus x670 strix F + 7950x , I first need to add the “x2apic=false” in the kernel options to boot to qubes. For the TSC issue, the frequency found by the system is wrong.
In dom0, the TSC is calibrated to 4491.520 Mhz which is kind of correct (~ approximatly the frequency of the CPU. I need to read a bit more about TSC and why it try a static frequency on a CPU with dynamic frequency ).
In domU, the TSC is calibrated to 196Mhz, and printing “/dev/cpuinfo” it seems that the domU system believe that 7950X is running at 196Mhz. It is wrong, it will run unusuably slow.

A work around I found it to manually override the configuration file used by libvirt/xen to start a domU.
Copy the libvirt configuration file to the qubes directory to override the configuration used:
cp /etc/virsh/libxl/DOMU_NAME.xml /etc/qubes/templates/libvirt/xen/by-name/
(create the directories if it doesn’t exist yet)

The in the xml search for the “clock” balise and force the TSC mode to “emulate” instead of “native”.

<clock offset='utc' adjustment='reset'>
<timer name='tsc' mode='emulate' />
</clock>

For a real fix for this issue, I have not idea yet.
I am not sure where is the issue, my first guess would be a bug in xen or libvirt.
It could also be a bug in the bios I think, a lot of things are broken in the bios

Will continue to dig deeper.

3 Likes

In an already installed Qubes you can set this via kernelopts

qvm-prefs -s VMName kernelopts ‘clocksource=tsc’

Hello, by default every vm is using the tsc clocksource (clocksource=tsc has recently be added by default in the kernel option).

After spending a bit more time on the issue, the root cause seems to be because the cpu information provided to the domU are wrong (cpu frequency).
From some chat on xen IRC with a maintainer,

so this is a massive rats nest with virt. By default, VMs are created to be migrateable, and that means no Invariant TSC feature. Guests work fine, but report wonky values
if you don’t plan to migrate the VM, you can set itsc=1 in your vm config file, and then the TSC clocksource ought to be happier

From my understanding QubesOS is already using invariant TSC with this option in libvirt <feature policy='require' name='invtsc'/>.

For the moment not any real progress on finding what is the thing that are broken.
By what is “invarient TSC” should be and my issue, I am asking myself if it is not the invarient TSC itself that is broken.

I am now doing a bit of reading:

  • Processor Programming Reference for AMD CPU, family 25 (0x19): https://www.amd.com/en/support/tech-docs?keyword=PPR
    Invarient TSC is a feature of the CPU itself.
    Qubes has never worked with a AMD cpu of family 25 before, can a bug specific to xen + family 25 + invarient TSC exist ?
    Reading a bit the source code of xen, like this part xen/xen/arch/x86/cpu/amd.c at master · xen-project/xen · GitHub
    c->x86 is the CPU family. Ryzen 1 is family 0x17 (One of my computer is Ryzen 1 and it work perfectly with Qubes). So I am searching for suspicious things related to the CPU family for AMD cpu.

Still no new answer from ASUS support about the BIOS, except that the problem is a bit more complex than expected and that it will take more time to understand.

A lot of new things to learn :slight_smile:

Some more tests:
On my ryzen 1 computer the policy <feature policy='require' name='invtsc'/> seems to have no influence, TSC is happy, /proc/cpuinfo is always correct. ( tried policy='require' and policy='disable')
On my ryzen 4, it also seems to have no influence, TSC is not happy, /proc/cpuinfo is always wrong

Modified the BIOS parameters a bit to try to see what it do. After modification, frequency reported in /proc/cpuinfo have been modified from 196Mhz to 205.166Mhz. Don’t know what specific parameter is responsible for that

3 Likes

I found the error. There is an integer overflow, most probably in xen hypervisor.
Hunting the thing

3 Likes

Nice. I’ll probably switch to a zen 3 cpu in the near future (once zen4 starts pushing down zen3 (second hand) prices), but it’s nice to know zen4 support will be there, so thanks. :slight_smile: Sadly even though AMD iirc is a partner of Xen, there seem to be a few issues with the speed at which they add actual support to the HV, plus xen isn’t exactly good about communicating about this sort of stuff.

2 Likes

For the moment no progress on my side.
From my IRC comment

For my issue, it seems to be a integer overflow. Somewhere there is a unsigned 32 bits integer storing the cpu frequency in Hz, this variable is responsible for passing the cpu frequency information to domU. When I downclock my CPU to below 4,294,967,295 Hz, the correct cpu frequency is passed to domU. After that it start back at 0 Hz. It explain why my domU is showing ~205 Mhz when my real CPU is running at ~4500Mhz. I am hunting for this integer to be switched to 64 bits integer. I am starting with the xen codebase, if someone have some hint on where to look specifically :slight_smile: If not I will probably be able to find it, but going to take me a few days I think

Trying to understand “what does what” in the xen source, but it is going to take a while. Trying to find what is the part of the code that give the vcpu informations to a domU.

Also another issue for later, the tool “xenoprof” doesn’t support AMD family 25 ( explicit statement in the logs ).

2 Likes

Don’t remember if it is because of the things I tryied to patch or if it was because I never tested it, but “PV” work as expected with the correct cpu frequency.

Only the PVH and HVM are problematics.
I am not so sure now that the issue is in the xen hypervisor code base. Maybe it is in the linux kernel directly, in the xen specific part linux/arch/x86/xen at master · torvalds/linux · GitHub

Going to take some more time TT

Update: Another funny thing to note and to understand or fix later, when starting a PVH linux domU, the linux kernel understand it as being a HVM and not a PVH. This is already the case in a standard qubes on a supported hardware.

This line is printing “Hypervisor detected: Xen HVM” in case of Xen PVH.

Related code:

We see this global variable being reassigned:

just before calling “xen_pvh_domain()” which is defined as being a reading the global variable “xen_pvh”:

“CONFIG_XEN_PVH” is defined in the qubes kernel linux configuration, from what I see.
I don’t know if it is an issue or not, but it feel weird that when using a PVH the linux kernel explicitly state that he think it is a HVM.

Update2: The kernel later understand that it is a PVH. So nothing to see here.

Anyway, that was not what I was trying to debug. The rabbit hole is deep.

3 Likes

My patches have nothing to do with the linux guest working correctly in PV mode

After some more tests:

This issue is specific to linux guest in HVM or PVH mode.

  • Windows guest are working correctly in HVM mode, frequency is correct.
  • Linux guest are working correctly in PV mode, frequency is correct
  • Linux guest in HVM and PVH mode are not displaying the correct frequency, there is a integer overflow as mentionned previously
1 Like

I certainly have no idea what I’m doing. But searching “frequency” in the xen codebase pulled up this:

Also here

Both are using unsigned int to store cpu frequency AFAIK (I don’t program in C).

2 Likes

Did some more testing.
Tracked back the cpu frequency to here:

In case of PV mode (dom0 or guest):
tsc_shift = -2 ; tsc_to_system_mul: 3_824_888_891

In case of PVH or HVM mode:
tsc_shift = 3; tsc_to_system_mul: 2_730_337_484

The calculation done by pvclock_tsc_khz to determine the CPU frequency seems to be correct and without overflow. The input data (tsc_to_system_mul and tsc_shift) seems to be source of the issue.

More debug is needed to reach the source issue.

Difference between PVH and HVM mode:
In case of HVM, the CPU is correctly calibrated using the PIT method (correct frequency found using this method):

So calculated cpu frequency and tsc frequency are different

later in the code, the linux kernel prefere to use the tsc frequency instead of the cpu frequency.
That may explain why a Windows HVM guest work correctly and a linux HVM guest does not

UPDATE, more debug information:
Getting closer.

By applying thoses 3 lines (to reproduce the same behavior as PV in this function), PVH and HVM now start with the correct frequency. So getting way closer to the source issue.

For the PVH and HVM mode, the method
void set_time_scale(struct time_scale *ts, u64 ticks_per_sec)

receive an incorrect value for “ticks_per_sec”

UPDATE 2
I think I found it:

“d->arch.tsc_khz” is a unsigned integer. The value expected by set_time_scale is a u64.
Since there is no cast from u32 to u64, when it get multiplied by 1000 (from KHZ to HZ), it overflow.
With explicit cast to u64 it should work.

Testing it. Going to take some hours.

UPDATE 3

I confirm that this is the source issue. I fixed it on my side, all seems to work as expected.
Now need to make a nice patch and speak with xen developer to integrate it

UPDATE 4
Patch normally sent to the xen-devel mailing list.
Copy here:

From c1535eba0bba6fc1b91f975f434af0929d9d7c96 Mon Sep 17 00:00:00 2001
Message-Id: <c1535eba0bba6fc1b91f975f434af0929d9d7c96.1671298409.git.xen@neowutran.ovh>
From: Neowutran <xen@neowutran.ovh>
Date: Sat, 17 Dec 2022 17:17:03 +0100
Subject: [Patch v1] Bug fix - Integer overflow when cpu frequency > u32 max value.

xen/arch/x86/time.c: Bug fix - Integer overflow when cpu frequency > u32 max value.

What is was trying to do: I was trying to install QubesOS on my new computer
(AMD zen4 processor). Guest VM were unusably slow / unusable.

What is the issue: The cpu frequency reported is wrong for linux guest in HVM
and PVH mode, and it cause issue with the TSC clocksource (for example).

Why this patch solved my issue:
The root cause it that "d->arch.tsc_khz" is a unsigned integer storing
the cpu frequency in khz. It get multiplied by 1000, so if the cpu frequency
is over ~4,294 Mhz (u32 max value), then it overflow.
I am solving the issue by adding an explicit cast to u64 to avoid the overflow.

---
 xen/arch/x86/time.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index b01acd390d..7c77ec8902 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2585,7 +2585,7 @@ int tsc_set_info(struct domain *d,
     case TSC_MODE_ALWAYS_EMULATE:
         d->arch.vtsc_offset = get_s_time() - elapsed_nsec;
         d->arch.tsc_khz = gtsc_khz ?: cpu_khz;
-        set_time_scale(&d->arch.vtsc_to_ns, d->arch.tsc_khz * 1000);
+        set_time_scale(&d->arch.vtsc_to_ns, (u64)d->arch.tsc_khz * 1000);

         /*
          * In default mode use native TSC if the host has safe TSC and
--
2.38.1

Now, next issue, GPU passthrough :smiley:

5 Likes

Thanks for your continued work, judging by the amount of “hearts” on this thread there are several other people interested in this as well. It would not be an exaggeration to say I look at this a couple times a day to gauge the progress you and other have been making! thanks again.

1 Like

Long story short - which Ryzen version is the highest that works perfectly (including its iGPU) with up-to-date current Qubes OS 4.1.1 (lets imaging user can install and update Qubes OS on different PC)? 5***, 4*** or what and how to select a Ryzen for this?

Is there any sense to buy 6*** or 7*** series at this point if user wants to make it work almost out of box on Qubes OS?

@balko this thread is about what need to be done to be able to use qubes os with a ryzen 7000 series.
I do not known the potentials issues of previous generation. However since at the moment, the xen hypervisor version used in stable release does not support cpu family 25, ryzen 7***, 6**** and 5**** should not work.

For the GPU passthrough:
On my old computer I have a RX580 that I can passthrough to a linux HVM for gaming.
I noticed that it seems there is a bug in the linux kernel for pci handling: The passthough work with lts kernel 5.4, but fail if I upgrade the kernel to 5.6.?+ (I can start the HVM but when I try to activate the GPU it fail with unhelpful error message) .


On my new computer, I restored the linux HVM. However, if I start it, it crash with kernel related error / memory violation

[2022-12-18 19:34:38] [    0.841975] general protection fault: 0000 [#1] SMP NOPTI
[2022-12-18 19:34:38] [    0.842001] CPU: 3 PID: 105 Comm: xenwatch Not tainted 5.4.215-1-lts54 #1
[2022-12-18 19:34:38] [    0.842016] Hardware name: Xen HVM domU, BIOS 4.18-unstable 12/11/2022
[2022-12-18 19:34:38] [    0.842033] RIP: 0010:kmem_cache_alloc_trace+0x84/0x200
[2022-12-18 19:34:38] [    0.842046] Code: 3b 76 49 83 78 10 00 4d 8b 38 0f 84 61 01 00 00 4d 85 ff 0f 84 58 01 00 00 41 8b 46 20 49 8b 9e 70 01 00 00 49 8b 3e 4c 01 f8 <48> 33 18 48 89 c1 4c 89 f8 48 0f c9 48 31 cb 48 8d 4a 01 65 48 0f
[2022-12-18 19:34:38] [    0.842084] RSP: 0018:ffffaef7001f3d88 EFLAGS: 00010202
[2022-12-18 19:34:38] [    0.842096] RAX: 1b6dd99358346dae RBX: e46fe56f475c6dae RCX: ffff97e0043bea10
[2022-12-18 19:34:38] [    0.842111] RDX: 0000000000000c14 RSI: 0000000000000d00 RDI: 0000000000034080
[2022-12-18 19:34:38] [    0.842128] RBP: 0000000000000d00 R08: ffff97e00adb4080 R09: 0000000000000000
[2022-12-18 19:34:38] [    0.842144] R10: 0000000000000001 R11: ffff97e00adb5170 R12: 0000000000000020
[2022-12-18 19:34:38] [    0.842161] R13: ffff97e009c03880 R14: ffff97e009c03880 R15: 1b6dd99358346dae
[2022-12-18 19:34:38] [    0.842178] FS:  0000000000000000(0000) GS:ffff97e00ad80000(0000) knlGS:0000000000000000
[2022-12-18 19:34:38] [    0.842195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:34:38] [    0.842208] CR2: 00007f2a87e69010 CR3: 0000000205eaa000 CR4: 0000000000740ee0
[2022-12-18 19:34:38] [    0.842236] PKRU: 55555554
[2022-12-18 19:34:38] [    0.842242] Call Trace:
[2022-12-18 19:34:38] [    0.842252]  ? blkfront_setup_indirect+0x138/0xdc0 [xen_blkfront]
[2022-12-18 19:34:38] [    0.842267]  blkfront_setup_indirect+0x138/0xdc0 [xen_blkfront]
[2022-12-18 19:34:38] [    0.842282]  ? count_strings+0x40/0x40
[2022-12-18 19:34:38] [    0.842291]  blkback_changed+0x302/0xe00 [xen_blkfront]
[2022-12-18 19:34:38] [    0.842302]  ? count_strings+0x40/0x40
[2022-12-18 19:34:38] [    0.842311]  xenwatch_thread+0x9a/0x160
[2022-12-18 19:34:38] [    0.842321]  ? wait_woken+0x80/0x80
[2022-12-18 19:34:38] [    0.842332]  kthread+0x10c/0x130
[2022-12-18 19:34:38] [    0.842340]  ? kthread_associate_blkcg+0x90/0x90
[2022-12-18 19:34:38] [    0.842352]  ret_from_fork+0x35/0x40
[2022-12-18 19:34:38] [    0.842361] Modules linked in: libps2 xen_blkfront(+) crc32c_intel ata_piix libata ehci_pci ehci_hcd scsi_mod i8042 floppy serio
[2022-12-18 19:34:38] [    0.842393] fbcon: Taking over console
[2022-12-18 19:34:38] [    0.842402] ---[ end trace 7d80e06b7a440a2c ]---
[2022-12-18 19:34:38] [    0.842412] RIP: 0010:kmem_cache_alloc_trace+0x84/0x200
[2022-12-18 19:34:38] [    0.842424] Code: 3b 76 49 83 78 10 00 4d 8b 38 0f 84 61 01 00 00 4d 85 ff 0f 84 58 01 00 00 41 8b 46 20 49 8b 9e 70 01 00 00 49 8b 3e 4c 01 f8 <48> 33 18 48 89 c1 4c 89 f8 48 0f c9 48 31 cb 48 8d 4a 01 65 48 0f
[2022-12-18 19:34:38] [    0.842463] RSP: 0018:ffffaef7001f3d88 EFLAGS: 00010202
[2022-12-18 19:34:38] [    0.842475] RAX: 1b6dd99358346dae RBX: e46fe56f475c6dae RCX: ffff97e0043bea10
[2022-12-18 19:34:38] [    0.842491] RDX: 0000000000000c14 RSI: 0000000000000d00 RDI: 0000000000034080
[2022-12-18 19:34:38] [    0.842507] RBP: 0000000000000d00 R08: ffff97e00adb4080 R09: 0000000000000000
[2022-12-18 19:34:38] [    0.842523] R10: 0000000000000001 R11: ffff97e00adb5170 R12: 0000000000000020
[2022-12-18 19:34:38] [    0.842538] R13: ffff97e009c03880 R14: ffff97e009c03880 R15: 1b6dd99358346dae
[2022-12-18 19:34:38] [    0.842555] FS:  0000000000000000(0000) GS:ffff97e00ad80000(0000) knlGS:0000000000000000
[2022-12-18 19:34:38] [    0.842572] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:34:38] [    0.842586] CR2: 00007f2a87e69010 CR3: 0000000205eaa000 CR4: 0000000000740ee0
[2022-12-18 19:34:38] [    0.842602] PKRU: 55555554
[2022-12-18 19:34:38] [    0.842665] Console: switching to colour frame buffer device 100x37
[2022-12-18 19:34:38] [    0.886137] Module has invalid ELF structures
[2022-12-18 19:34:38] [    0.888804] Module has invalid ELF structures
[2022-12-18 19:34:39] [    0.892788] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input2
[2022-12-18 19:34:39] [    0.893321] general protection fault: 0000 [#2] SMP NOPTI
[2022-12-18 19:34:39] [    0.893421] CPU: 3 PID: 2 Comm: kthreadd Tainted: G      D           5.4.215-1-lts54 #1
[2022-12-18 19:34:39] [    0.893554] Hardware name: Xen HVM domU, BIOS 4.18-unstable 12/11/2022
[2022-12-18 19:34:39] [    0.893658] RIP: 0010:__kmalloc_node+0x185/0x2d0
[2022-12-18 19:34:39] [    0.893837] Code: e8 4c 8b 44 24 08 4c 89 e1 4c 89 f2 4c 89 fe e8 a1 e1 99 00 48 83 3b 00 58 75 d5 e9 6a ff ff ff 41 8b 41 20 49 8b 39 4c 01 f0 <48> 8b 18 48 89 c1 49 33 99 70 01 00 00 4c 89 f0 48 0f c9 48 31 cb
[2022-12-18 19:34:39] [    0.894315] RSP: 0018:ffffaef700027d18 EFLAGS: 00010202
[2022-12-18 19:34:39] [    0.894484] RAX: 1b6dd99358346dae RBX: 0000000000000dc0 RCX: ffff97e009fb3810
[2022-12-18 19:34:39] [    0.895669] RDX: 0000000000000c14 RSI: 0000000000000dc0 RDI: 0000000000034080
[2022-12-18 19:34:39] [    0.895917] RBP: 0000000000000dc0 R08: ffff97e00adb4080 R09: ffff97e009c03880
[2022-12-18 19:34:39] [    0.896161] R10: ffffaef700355000 R11: ffffaef700350000 R12: 0000000000000020
[2022-12-18 19:34:39] [    0.896410] R13: 0000000000000000 R14: 1b6dd99358346dae R15: ffff97e009c03880
[2022-12-18 19:34:39] [    0.896662] FS:  0000000000000000(0000) GS:ffff97e00ad80000(0000) knlGS:0000000000000000
[2022-12-18 19:34:39] [    0.896916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:34:39] [    0.897159] CR2: 00007f2a87e69010 CR3: 0000000205d16000 CR4: 0000000000740ee0
[2022-12-18 19:34:39] [    0.897420] PKRU: 55555554
[2022-12-18 19:34:39] [    0.897653] Call Trace:
[2022-12-18 19:34:39] [    0.897884]  ? __vmalloc_node_range+0xd9/0x2d0
[2022-12-18 19:34:39] [    0.898120]  __vmalloc_node_range+0xd9/0x2d0
[2022-12-18 19:34:39] [    0.898353]  copy_process+0x923/0x1a60
[2022-12-18 19:34:39] [    0.898590]  ? _do_fork+0x74/0x3a0
[2022-12-18 19:34:39] [    0.898812]  ? __switch_to_asm+0x40/0x70
[2022-12-18 19:34:39] [    0.899030]  ? __switch_to_asm+0x34/0x70
[2022-12-18 19:34:39] [    0.899245]  ? __switch_to_asm+0x34/0x70
[2022-12-18 19:34:39] [    0.899455]  ? __switch_to_asm+0x40/0x70
[2022-12-18 19:34:39] [    0.899667]  _do_fork+0x74/0x3a0
[2022-12-18 19:34:39] [    0.899874]  ? finish_task_switch+0x72/0x240
[2022-12-18 19:34:39] [    0.900084]  kernel_thread+0x55/0x70
[2022-12-18 19:34:39] [    0.900284]  ? kthread_associate_blkcg+0x90/0x90
[2022-12-18 19:34:39] [    0.900489]  kthreadd+0x14b/0x1a0
[2022-12-18 19:34:39] [    0.900686]  ? kthread_is_per_cpu+0x30/0x30
[2022-12-18 19:34:39] [    0.900882]  ret_from_fork+0x35/0x40
[2022-12-18 19:34:39] [    0.901076] Modules linked in: serio_raw atkbd pata_acpi libps2 xen_blkfront(+) crc32c_intel ata_piix libata ehci_pci ehci_hcd scsi_mod i8042 floppy serio
[2022-12-18 19:34:39] [    1.043804] ---[ end trace 7d80e06b7a440a2d ]---
[2022-12-18 19:34:39] [    1.044020] RIP: 0010:kmem_cache_alloc_trace+0x84/0x200
[2022-12-18 19:34:39] [    1.044243] Code: 3b 76 49 83 78 10 00 4d 8b 38 0f 84 61 01 00 00 4d 85 ff 0f 84 58 01 00 00 41 8b 46 20 49 8b 9e 70 01 00 00 49 8b 3e 4c 01 f8 <48> 33 18 48 89 c1 4c 89 f8 48 0f c9 48 31 cb 48 8d 4a 01 65 48 0f
[2022-12-18 19:34:39] [    1.044923] RSP: 0018:ffffaef7001f3d88 EFLAGS: 00010202
[2022-12-18 19:34:39] [    1.045143] RAX: 1b6dd99358346dae RBX: e46fe56f475c6dae RCX: ffff97e0043bea10
[2022-12-18 19:34:39] [    1.045358] RDX: 0000000000000c14 RSI: 0000000000000d00 RDI: 0000000000034080
[2022-12-18 19:34:39] [    1.045568] RBP: 0000000000000d00 R08: ffff97e00adb4080 R09: 0000000000000000
[2022-12-18 19:34:39] [    1.045784] R10: 0000000000000001 R11: ffff97e00adb5170 R12: 0000000000000020
[2022-12-18 19:34:39] [    1.045991] R13: ffff97e009c03880 R14: ffff97e009c03880 R15: 1b6dd99358346dae
[2022-12-18 19:34:39] [    1.046212] FS:  0000000000000000(0000) GS:ffff97e00ad80000(0000) knlGS:0000000000000000
[2022-12-18 19:34:39] [    1.046440] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:34:39] [    1.046646] CR2: 00007f2a87e69010 CR3: 0000000205d16000 CR4: 0000000000740ee0
[2022-12-18 19:34:39] [    1.046857] PKRU: 55555554
[2022-12-18 19:34:39] [    1.188388] usb 1-1: new high-speed USB device number 2 using ehci-pci
[2022-12-18 19:34:39] [    1.358411] tsc: Refined TSC clocksource calibration: 4491.532 MHz
[2022-12-18 19:34:39] [    1.360135] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x40be298b2d9, max_idle_ns: 440795414753 ns
[2022-12-18 19:34:39] [    1.369153] clocksource: Switched to clocksource tsc
[2022-12-18 19:34:39] [    1.413328] usb 1-1: New USB device found, idVendor=0627, idProduct=0001, bcdDevice= 0.00
[2022-12-18 19:34:39] [    1.414811] usb 1-1: New USB device strings: Mfr=1, Product=3, SerialNumber=10
[2022-12-18 19:34:39] [    1.415046] usb 1-1: Product: QEMU USB Tablet
[2022-12-18 19:34:39] [    1.415265] usb 1-1: Manufacturer: QEMU
[2022-12-18 19:34:39] [    1.415483] usb 1-1: SerialNumber: 42
[2022-12-18 19:34:39] [    1.422701] general protection fault: 0000 [#3] SMP NOPTI
[2022-12-18 19:34:39] [    1.424461] CPU: 3 PID: 144 Comm: systemd-udevd Tainted: G      D           5.4.215-1-lts54 #1
[2022-12-18 19:34:39] [    1.424741] Hardware name: Xen HVM domU, BIOS 4.18-unstable 12/11/2022
[2022-12-18 19:34:39] [    1.425007] RIP: 0010:__kmalloc_track_caller+0x8e/0x230
[2022-12-18 19:34:39] [    1.425268] Code: 08 65 4c 03 05 8b fd 3a 76 49 83 78 10 00 4d 8b 38 0f 84 94 01 00 00 4d 85 ff 0f 84 8b 01 00 00 41 8b 46 20 49 8b 3e 4c 01 f8 <48> 8b 18 48 89 c1 49 33 9e 70 01 00 00 4c 89 f8 48 0f c9 48 31 cb
[2022-12-18 19:34:39] [    1.426092] RSP: 0018:ffffaef70021be18 EFLAGS: 00010202
[2022-12-18 19:34:39] [    1.426373] RAX: 1b6dd99358346dae RBX: 0000000000000cc0 RCX: 0000000000000000
[2022-12-18 19:34:39] [    1.426677] RDX: 0000000000000c14 RSI: 0000000000000cc0 RDI: 0000000000034080
[2022-12-18 19:34:39] [    1.426973] RBP: 0000000000000cc0 R08: ffff97e00adb4080 R09: ffffffff8ac42348
[2022-12-18 19:34:39] [    1.427267] R10: ffff97e0069a3000 R11: 0000000000000010 R12: 0000000000000013
[2022-12-18 19:34:39] [    1.427556] R13: ffff97e009c03880 R14: ffff97e009c03880 R15: 1b6dd99358346dae
[2022-12-18 19:34:39] [    1.427850] FS:  00007f2a8727b200(0000) GS:ffff97e00ad80000(0000) knlGS:0000000000000000
[2022-12-18 19:34:39] [    1.428146] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:34:39] [    1.428426] CR2: 00007f2a87e69010 CR3: 0000000205dc2000 CR4: 0000000000740ee0
[2022-12-18 19:34:39] [    1.428674] PKRU: 55555554
[2022-12-18 19:34:39] [    1.428924] Call Trace:
[2022-12-18 19:34:39] [    1.429165]  ? shmem_symlink+0xbd/0x280
[2022-12-18 19:34:39] [    1.429414]  kmemdup+0x17/0x40
[2022-12-18 19:34:39] [    1.429661]  shmem_symlink+0xbd/0x280
[2022-12-18 19:34:39] [    1.429913]  vfs_symlink+0xe1/0x170
[2022-12-18 19:34:39] [    1.430159]  do_symlinkat+0x120/0x140
[2022-12-18 19:34:39] [    1.430407]  do_syscall_64+0x49/0x90
[2022-12-18 19:34:39] [    1.430650]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[2022-12-18 19:34:39] [    1.430905] RIP: 0033:0x7f2a87c0584b
[2022-12-18 19:34:39] [    1.431137] Code: f0 ff ff 73 01 c3 48 8b 0d 3a f5 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 58 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d f5 0d 00 f7 d8 64 89 01 48
[2022-12-18 19:34:39] [    1.431910] RSP: 002b:00007ffd42b8bf58 EFLAGS: 00000246 ORIG_RAX: 0000000000000058
[2022-12-18 19:34:39] [    1.432176] RAX: ffffffffffffffda RBX: 0000560e56ef2b90 RCX: 00007f2a87c0584b
[2022-12-18 19:34:39] [    1.432449] RDX: 000000000000a000 RSI: 00007ffd42b8bf60 RDI: 0000560e56ef2bd0
[2022-12-18 19:34:39] [    1.432713] RBP: 00007ffd42b8c0b0 R08: 0000000000000009 R09: 0000000000000000
[2022-12-18 19:34:39] [    1.432985] R10: 0000000000000000 R11: 0000000000000246 R12: 0000560e56eed710
[2022-12-18 19:34:39] [    1.433249] R13: 0000000000000000 R14: 00007ffd42b8bf60 R15: 0000560e56ef2bd0
[2022-12-18 19:34:39] [    1.433474] Modules linked in: serio_raw atkbd pata_acpi libps2 xen_blkfront(+) crc32c_intel ata_piix libata ehci_pci ehci_hcd scsi_mod i8042 floppy serio
[2022-12-18 19:34:39] [    1.443915] ---[ end trace 7d80e06b7a440a2e ]---
[2022-12-18 19:34:39] [    1.444143] RIP: 0010:kmem_cache_alloc_trace+0x84/0x200
[2022-12-18 19:34:39] [    1.444371] Code: 3b 76 49 83 78 10 00 4d 8b 38 0f 84 61 01 00 00 4d 85 ff 0f 84 58 01 00 00 41 8b 46 20 49 8b 9e 70 01 00 00 49 8b 3e 4c 01 f8 <48> 33 18 48 89 c1 4c 89 f8 48 0f c9 48 31 cb 48 8d 4a 01 65 48 0f
[2022-12-18 19:34:39] [    1.445057] RSP: 0018:ffffaef7001f3d88 EFLAGS: 00010202
[2022-12-18 19:34:39] [    1.445281] RAX: 1b6dd99358346dae RBX: e46fe56f475c6dae RCX: ffff97e0043bea10
[2022-12-18 19:34:39] [    1.445511] RDX: 0000000000000c14 RSI: 0000000000000d00 RDI: 0000000000034080
[2022-12-18 19:34:39] [    1.445736] RBP: 0000000000000d00 R08: ffff97e00adb4080 R09: 0000000000000000
[2022-12-18 19:34:39] [    1.445963] R10: 0000000000000001 R11: ffff97e00adb5170 R12: 0000000000000020
[2022-12-18 19:34:39] [    1.446186] R13: ffff97e009c03880 R14: ffff97e009c03880 R15: 1b6dd99358346dae
[2022-12-18 19:34:39] [    1.446409] FS:  00007f2a8727b200(0000) GS:ffff97e00ad80000(0000) knlGS:0000000000000000
[2022-12-18 19:34:39] [    1.446641] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:34:39] [    1.446865] CR2: 00007f2a87e69010 CR3: 0000000205dc2000 CR4: 0000000000740ee0
[2022-12-18 19:34:39] [    1.447093] PKRU: 55555554
[2022-12-18 19:34:39] [    1.447362] BUG: unable to handle page fault for address: ffff97e803c4a008
[2022-12-18 19:34:39] [    1.447590] #PF: supervisor write access in kernel mode
[2022-12-18 19:34:39] [    1.447816] #PF: error_code(0x0002) - not-present page
[2022-12-18 19:34:39] [    1.448042] PGD 145001067 P4D 145001067 PUD 0 
[2022-12-18 19:34:39] [    1.448268] Oops: 0002 [#4] SMP NOPTI
[2022-12-18 19:34:39] [    1.448495] CPU: 3 PID: 144 Comm: systemd-udevd Tainted: G      D           5.4.215-1-lts54 #1
[2022-12-18 19:34:39] [    1.448738] Hardware name: Xen HVM domU, BIOS 4.18-unstable 12/11/2022
[2022-12-18 19:34:39] [    1.448979] RIP: 0010:__tlb_remove_page_size+0x12/0x80
[2022-12-18 19:34:39] [    1.449221] Code: 48 89 ef 5b 31 f6 5d e9 0c 13 01 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 48 8b 47 28 8b 50 08 8d 4a 01 89 48 08 <48> 89 74 d0 10 3b 48 0c 74 03 31 c0 c3 53 48 8b 47 28 48 89 fb 48
[2022-12-18 19:34:39] [    1.449972] RSP: 0018:ffffaef70021bcc8 EFLAGS: 00010206
[2022-12-18 19:34:39] [    1.450212] RAX: ffff97e003c4a000 RBX: ffff97e006b75c40 RCX: 0000000000000000
[2022-12-18 19:34:39] [    1.450462] RDX: 00000000ffffffff RSI: fffffaedc81b0c40 RDI: ffffaef70021be38
[2022-12-18 19:34:39] [    1.450711] RBP: 0000000206c31025 R08: ffff97e009552708 R09: 0000000000000000
[2022-12-18 19:34:39] [    1.450961] R10: 0000000000000001 R11: ffff97e00adb5170 R12: fffffaedc81b0c40
[2022-12-18 19:34:39] [    1.451202] R13: ffffaef70021be38 R14: 0000560e55989000 R15: 0000560e55988000
[2022-12-18 19:34:39] [    1.451442] FS:  0000000000000000(0000) GS:ffff97e00ad80000(0000) knlGS:0000000000000000
[2022-12-18 19:34:39] [    1.451695] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:34:39] [    1.451944] CR2: ffff97e803c4a008 CR3: 0000000205dc2000 CR4: 0000000000740ee0
[2022-12-18 19:34:39] [    1.452196] PKRU: 55555554
[2022-12-18 19:34:39] [    1.452429] Call Trace:
[2022-12-18 19:34:39] [    1.452660]  unmap_page_range+0x7d6/0xf50
[2022-12-18 19:34:39] [    1.452894]  ? oops_end+0xbd/0xc0
[2022-12-18 19:34:39] [    1.453120]  unmap_vmas+0x6e/0xd0
[2022-12-18 19:34:39] [    1.454157]  exit_mmap+0xa9/0x190
[2022-12-18 19:34:39] [    1.454694]  mmput+0x49/0x110
[2022-12-18 19:34:39] [    1.454911]  do_exit+0x2fa/0xa30
[2022-12-18 19:34:39] [    1.455119]  ? do_symlinkat+0x120/0x140
[2022-12-18 19:34:39] [    1.455326]  rewind_stack_do_exit+0x17/0x20
[2022-12-18 19:34:39] [    1.455535] RIP: 0033:0x7f2a87c0584b
[2022-12-18 19:34:39] [    1.455741] Code: f0 ff ff 73 01 c3 48 8b 0d 3a f5 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 58 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d f5 0d 00 f7 d8 64 89 01 48
[2022-12-18 19:34:39] [    1.456399] RSP: 002b:00007ffd42b8bf58 EFLAGS: 00000246 ORIG_RAX: 0000000000000058
[2022-12-18 19:34:39] [    1.456626] RAX: ffffffffffffffda RBX: 0000560e56ef2b90 RCX: 00007f2a87c0584b
[2022-12-18 19:34:39] [    1.644958] RDX: 000000000000a000 RSI: 00007ffd42b8bf60 RDI: 0000560e56ef2bd0
[2022-12-18 19:34:39] [    1.645184] RBP: 00007ffd42b8c0b0 R08: 0000000000000009 R09: 0000000000000000
[2022-12-18 19:34:39] [    1.646548] R10: 0000000000000000 R11: 0000000000000246 R12: 0000560e56eed710
[2022-12-18 19:34:39] [    1.646769] R13: 0000000000000000 R14: 00007ffd42b8bf60 R15: 0000560e56ef2bd0
[2022-12-18 19:34:39] [    1.646990] Modules linked in: serio_raw atkbd pata_acpi libps2 xen_blkfront(+) crc32c_intel ata_piix libata ehci_pci ehci_hcd scsi_mod i8042 floppy serio
[2022-12-18 19:34:39] [    1.647447] CR2: ffff97e803c4a008
[2022-12-18 19:34:39] [    1.647660] ---[ end trace 7d80e06b7a440a2f ]---
[2022-12-18 19:34:39] [    1.647878] RIP: 0010:kmem_cache_alloc_trace+0x84/0x200
[2022-12-18 19:34:39] [    1.648089] Code: 3b 76 49 83 78 10 00 4d 8b 38 0f 84 61 01 00 00 4d 85 ff 0f 84 58 01 00 00 41 8b 46 20 49 8b 9e 70 01 00 00 49 8b 3e 4c 01 f8 <48> 33 18 48 89 c1 4c 89 f8 48 0f c9 48 31 cb 48 8d 4a 01 65 48 0f
[2022-12-18 19:34:39] [    1.648745] RSP: 0018:ffffaef7001f3d88 EFLAGS: 00010202
[2022-12-18 19:34:39] [    1.648962] RAX: 1b6dd99358346dae RBX: e46fe56f475c6dae RCX: ffff97e0043bea10
[2022-12-18 19:34:39] [    1.649186] RDX: 0000000000000c14 RSI: 0000000000000d00 RDI: 0000000000034080
[2022-12-18 19:34:39] [    1.649406] RBP: 0000000000000d00 R08: ffff97e00adb4080 R09: 0000000000000000
[2022-12-18 19:34:39] [    1.649626] R10: 0000000000000001 R11: ffff97e00adb5170 R12: 0000000000000020
[2022-12-18 19:34:39] [    1.649842] R13: ffff97e009c03880 R14: ffff97e009c03880 R15: 1b6dd99358346dae
[2022-12-18 19:34:39] [    1.650056] FS:  0000000000000000(0000) GS:ffff97e00ad80000(0000) knlGS:0000000000000000
[2022-12-18 19:34:39] [    1.650276] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:34:39] [    1.650490] CR2: ffff97e803c4a008 CR3: 0000000205dc2000 CR4: 0000000000740ee0
[2022-12-18 19:34:39] [    1.650711] PKRU: 55555554
[2022-12-18 19:34:39] [    1.650920] Fixing recursive fault but reboot is needed!
[2022-12-18 19:34:52] [    5.828331] xenbus_probe_frontend: Waiting for devices to initialise: 25s...20s...
[2022-12-18 19:34:52] [   14.168465] random: crng init done
[2022-12-18 19:35:08] [   15.728335] 15s...10s...5s...0s...
[2022-12-18 19:36:51] Logfile Opened
[2022-12-18 19:36:54] :: running early hook [udev]
[2022-12-18 19:36:54] Starting version 251.5-1-arch
[2022-12-18 19:36:54] :: running hook [udev]
[2022-12-18 19:36:54] :: Triggering uevents...
[2022-12-18 19:36:55] [    0.812811] general protection fault: 0000 [#1] SMP NOPTI
[2022-12-18 19:36:55] [    0.812837] CPU: 1 PID: 105 Comm: xenwatch Not tainted 5.4.215-1-lts54 #1
[2022-12-18 19:36:55] [    0.812852] Hardware name: Xen HVM domU, BIOS 4.18-unstable 12/11/2022
[2022-12-18 19:36:55] [    0.812871] RIP: 0010:kmem_cache_alloc_trace+0x84/0x200
[2022-12-18 19:36:55] [    0.812882] Code: fb 44 49 83 78 10 00 4d 8b 38 0f 84 61 01 00 00 4d 85 ff 0f 84 58 01 00 00 41 8b 46 20 49 8b 9e 70 01 00 00 49 8b 3e 4c 01 f8 <48> 33 18 48 89 c1 4c 89 f8 48 0f c9 48 31 cb 48 8d 4a 01 65 48 0f
[2022-12-18 19:36:55] [    0.812921] RSP: 0018:ffffb555c01f3d88 EFLAGS: 00010282
[2022-12-18 19:36:55] [    0.812934] RAX: dbd3225111d050f8 RBX: 44c51e6d37b650f8 RCX: ffff99d9c43bd650
[2022-12-18 19:36:55] [    0.812950] RDX: 0000000000001bb6 RSI: 0000000000000d00 RDI: 0000000000034080
[2022-12-18 19:36:55] [    0.812966] RBP: 0000000000000d00 R08: ffff99d9cacb4080 R09: 0000000000000000
[2022-12-18 19:36:55] [    0.812983] R10: 0000000000000001 R11: ffff99d9cacb5170 R12: 0000000000000020
[2022-12-18 19:36:55] [    0.813000] R13: ffff99d9c9c03880 R14: ffff99d9c9c03880 R15: dbd3225111d050f8
[2022-12-18 19:36:55] [    0.813018] FS:  0000000000000000(0000) GS:ffff99d9cac80000(0000) knlGS:0000000000000000
[2022-12-18 19:36:55] [    0.813035] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:36:55] [    0.813048] CR2: 0000560861b85018 CR3: 0000000206344000 CR4: 0000000000740ee0
[2022-12-18 19:36:55] [    0.813066] PKRU: 55555554
[2022-12-18 19:36:55] [    0.813072] Call Trace:
[2022-12-18 19:36:55] [    0.813083]  ? blkfront_setup_indirect+0x138/0xdc0 [xen_blkfront]
[2022-12-18 19:36:55] [    0.813098]  blkfront_setup_indirect+0x138/0xdc0 [xen_blkfront]
[2022-12-18 19:36:55] [    0.813116]  ? count_strings+0x40/0x40
[2022-12-18 19:36:55] [    0.813125]  blkback_changed+0x302/0xe00 [xen_blkfront]
[2022-12-18 19:36:55] [    0.813136]  ? count_strings+0x40/0x40
[2022-12-18 19:36:55] [    0.813145]  xenwatch_thread+0x9a/0x160
[2022-12-18 19:36:55] [    0.813159]  ? wait_woken+0x80/0x80
[2022-12-18 19:36:55] [    0.813170]  kthread+0x10c/0x130
[2022-12-18 19:36:55] [    0.813178]  ? kthread_associate_blkcg+0x90/0x90
[2022-12-18 19:36:55] [    0.813192]  ret_from_fork+0x35/0x40
[2022-12-18 19:36:55] [    0.813201] Modules linked in: libps2 xen_blkfront(+) crc32c_intel ata_piix libata ehci_pci ehci_hcd scsi_mod i8042 floppy serio
[2022-12-18 19:36:55] [    0.813233] fbcon: Taking over console
[2022-12-18 19:36:55] [    0.813242] ---[ end trace ac6bf55eff6c768f ]---
[2022-12-18 19:36:55] [    0.813254] RIP: 0010:kmem_cache_alloc_trace+0x84/0x200
[2022-12-18 19:36:55] [    0.813265] Code: fb 44 49 83 78 10 00 4d 8b 38 0f 84 61 01 00 00 4d 85 ff 0f 84 58 01 00 00 41 8b 46 20 49 8b 9e 70 01 00 00 49 8b 3e 4c 01 f8 <48> 33 18 48 89 c1 4c 89 f8 48 0f c9 48 31 cb 48 8d 4a 01 65 48 0f
[2022-12-18 19:36:55] [    0.813304] RSP: 0018:ffffb555c01f3d88 EFLAGS: 00010282
[2022-12-18 19:36:55] [    0.813316] RAX: dbd3225111d050f8 RBX: 44c51e6d37b650f8 RCX: ffff99d9c43bd650
[2022-12-18 19:36:55] [    0.813333] RDX: 0000000000001bb6 RSI: 0000000000000d00 RDI: 0000000000034080
[2022-12-18 19:36:55] [    0.813349] RBP: 0000000000000d00 R08: ffff99d9cacb4080 R09: 0000000000000000
[2022-12-18 19:36:55] [    0.813367] R10: 0000000000000001 R11: ffff99d9cacb5170 R12: 0000000000000020
[2022-12-18 19:36:55] [    0.813383] R13: ffff99d9c9c03880 R14: ffff99d9c9c03880 R15: dbd3225111d050f8
[2022-12-18 19:36:55] [    0.813401] FS:  0000000000000000(0000) GS:ffff99d9cac80000(0000) knlGS:0000000000000000
[2022-12-18 19:36:55] [    0.813418] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-12-18 19:36:55] [    0.813432] CR2: 0000560861b85018 CR3: 0000000206344000 CR4: 0000000000740ee0
[2022-12-18 19:36:55] [    0.813448] PKRU: 55555554

It is directly related to the gpu passthrough (If do not do the PCI passthrough, the HVM start correctly) .


If I upgrade the kernel to a newer version, I can start the HVM but end up with the same kernel bug as with my old computer


So there is at least 2 differents issues.

  • One of the issue is a regression in the linux kernel related to PCI handling, the regression was introduced around 5.6.X. This should be the easiest bug to find since I can reduce the scope by upgrading to newer kernel until I find which specific version introduced the bug and then try to find it in the commit / source code. But I expect it to be very time consuming, again (in the beginning of the process could use the distribution archives to speed up by not needing to compile everything).
  • For the second issue, I have no idea at the moment. Something related to qemu version ? related to the linux kernel used to launch qemu ? a xen dependencie in the VM that is not of the correct version ? Lot of testing required to reduce the possibilities. (Try with gpu passthrough, without, with but without strict reset. Try all of the above but with non gpu PCI device. Try different kernel version (since it is directly related to the linux kernel version used ))

Update
For the second issue it feel like it is related to the xen_blkfront and xen_blkback drivers in the linux kernel. Maybe that a xen hypervisor version require guest to have some specific version of the linux kernel. Anyway, won’t focus on this issue.

For the first issue, kernel log indicate (on my zen4 computer, HVM kernel is 6.0.12):

[    1.755996] input: HDA ATI HDMI HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:07.0/sound/card0/input11
[    1.815044] input: QEMU QEMU USB Tablet as /devices/pci0000:00/0000:00:04.0/usb1/1-1/1-1:1.0/0003:0627:0001.0001/input/input12
[    1.815067] hid-generic 0003:0627:0001.0001: input,hidraw0: USB HID v0.01 Mouse [QEMU QEMU USB Tablet] on usb-0000:00:04.0-1/input0
[    1.815082] usbcore: registered new interface driver usbhid
[    1.815082] usbhid: USB HID core driver
[    2.018058] Console: switching to colour frame buffer device 128x48
[    2.041163] audit: type=1130 audit(1671393445.293:7): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubes-mount-dirs comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    2.221366] audit: type=1130 audit(1671393445.493:8): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journal-flush comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    2.275172] audit: type=1130 audit(1671393445.546:9): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dev-xvdc1-swap comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    2.275180] audit: type=1131 audit(1671393445.546:10): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dev-xvdc1-swap comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    2.275299] [drm] amdgpu kernel modesetting enabled.
[    2.276057] mousedev: PS/2 mouse device common for all mice
[    2.278349] amdgpu: CRAT table not found
[    2.278351] amdgpu: Virtual CRAT table created for CPU
[    2.278356] amdgpu: Topology: Add CPU node
[    2.278687] xen: --> pirq=24 -> irq=40 (gsi=40)
[    2.278790] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1043:0x0525 0xE7).
[    2.278793] [drm] register mmio base: 0xF2200000
[    2.278794] [drm] register mmio size: 262144
[    2.279710] [drm] add ip block number 0 <vi_common>
[    2.279712] [drm] add ip block number 1 <gmc_v8_0>
[    2.279712] [drm] add ip block number 2 <tonga_ih>
[    2.279713] [drm] add ip block number 3 <gfx_v8_0>
[    2.279714] [drm] add ip block number 4 <sdma_v3_0>
[    2.279715] [drm] add ip block number 5 <powerplay>
[    2.279716] [drm] add ip block number 6 <dm>
[    2.279716] [drm] add ip block number 7 <uvd_v6_0>
[    2.279717] [drm] add ip block number 8 <vce_v3_0>
[    2.452942] amdgpu 0000:00:06.0: amdgpu: Fetched VBIOS from ROM
[    2.452944] amdgpu: ATOM BIOS: 115-D009PI2-101
[    2.452958] [drm] UVD is enabled in VM mode
[    2.452961] [drm] UVD ENC is enabled in VM mode
[    2.452962] [drm] VCE enabled in VM mode
[    2.452963] amdgpu 0000:00:06.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    2.453357] [drm] GPU posting now...
[    2.559433] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[    2.564516] amdgpu 0000:00:06.0: amdgpu: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    2.564518] amdgpu 0000:00:06.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[    2.564525] [drm] Detected VRAM RAM=4096M, BAR=256M
[    2.564526] [drm] RAM width 256bits GDDR5
[    2.564534] [drm] amdgpu: 4096M of VRAM memory ready
[    2.564535] [drm] amdgpu: 3887M of GTT memory ready.
[    2.564547] [drm] GART: num cpu pages 65536, num gpu pages 65536
[    2.565709] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[    2.568500] [drm] Chained IB support enabled!
[    2.574044] amdgpu: hwmgr_sw_init smu backed is polaris10_smu
[    2.581882] [drm] Found UVD firmware Version: 1.130 Family ID: 16
[    2.613532] [drm] Found VCE firmware Version: 53.26 Binary ID: 3
[    3.154314] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input5
[    6.222318] amdgpu: SMU load firmware failed
[    6.223593] amdgpu: fw load failed
[    6.224559] amdgpu: smu firmware loading failed
[    6.224579] amdgpu 0000:00:06.0: amdgpu: amdgpu_device_ip_init failed
[    6.224597] amdgpu 0000:00:06.0: amdgpu: Fatal error during GPU init
[    6.224614] amdgpu 0000:00:06.0: amdgpu: amdgpu: finishing device.
[    6.226349] amdgpu: probe of 0000:00:06.0 failed with error -22
[    6.226358] BUG: kernel NULL pointer dereference, address: 0000000000000090
[    6.226372] #PF: supervisor write access in kernel mode
[    6.226382] #PF: error_code(0x0002) - not-present page
[    6.226391] PGD 0 P4D 0
[    6.226398] Oops: 0002 [#1] PREEMPT SMP NOPTI
[    6.226409] CPU: 2 PID: 315 Comm: systemd-udevd Not tainted 6.0.12-arch1-1 #1 c9932778529b16cae8b206cc5eba53043cd7ca6a
[    6.226425] Hardware name: Xen HVM domU, BIOS 4.18-unstable 12/11/2022
[    6.226436] RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
[    6.226449] Code: e2 12 d4 cf c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc cc cc 4c 8d 63 f0 4c 89 e7 e8 24 de 88 d0 48 8b 03 48 39 d8 74 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 95 de 88
[    6.226478] RSP: 0018:ffffb57f00783ac8 EFLAGS: 00010207
[    6.226488] RAX: 0000000000000000 RBX: ffff9800536896d0 RCX: ffff9800502ca5c0
[    6.226500] RDX: 0000000000000001 RSI: ffff9800502ca5e8 RDI: ffff9800536896c0
[    6.226512] RBP: ffff980053689628 R08: ffffffff91aead8d R09: 0000000000000010
[    6.226525] R10: 000000000000003a R11: ffff98005041eda0 R12: ffff9800536896c0
[    6.226538] R13: ffff980053689630 R14: ffff980053686208 R15: ffffb57f00783db0
[    6.226551] FS:  00007fe799e4c080(0000) GS:ffff98014af00000(0000) knlGS:0000000000000000
[    6.226564] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.226575] CR2: 0000000000000090 CR3: 0000000102318000 CR4: 0000000000750ee0
[    6.226588] PKRU: 55555554
[    6.226595] Call Trace:
[    6.226601]  <TASK>
[    6.226608]  amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu 71a5a223485e453556b42d4d63875cf5a0137241]
[    6.226746]  amdgpu_device_fini_sw+0x33/0x390 [amdgpu 71a5a223485e453556b42d4d63875cf5a0137241]
[    6.226843]  amdgpu_driver_release_kms+0x16/0x30 [amdgpu 71a5a223485e453556b42d4d63875cf5a0137241]
[    6.226934]  devm_drm_dev_init_release+0x49/0x70
[    6.226946]  release_nodes+0x40/0xb0
[    6.226955]  devres_release_all+0x8c/0xc0
[    6.226963]  device_unbind_cleanup+0xe/0x70
[    6.226974]  really_probe+0x242/0x380
[    6.226982]  ? pm_runtime_barrier+0x54/0x90
[    6.226991]  __driver_probe_device+0x78/0x170
[    6.227000]  driver_probe_device+0x1f/0x90
[    6.227008]  __driver_attach+0xd5/0x1d0
[    6.227015]  ? __device_attach_driver+0x110/0x110
[    6.227025]  bus_for_each_dev+0x8b/0xd0
[    6.227034]  bus_add_driver+0x1b2/0x200
[    6.227042]  driver_register+0x8d/0xe0
[    6.227050]  ? 0xffffffffc1330000
[    6.227059]  do_one_initcall+0x5d/0x220
[    6.227069]  do_init_module+0x4a/0x1e0
[    6.227078]  __do_sys_init_module+0x17f/0x1b0
[    6.227215]  do_syscall_64+0x5f/0x90
[    6.227342]  ? syscall_exit_to_user_mode+0x1b/0x40
[    6.227469]  ? do_syscall_64+0x6b/0x90
--
2 Likes

Thanks a lot for information, I’m just a bit overwhelmed with information about Ryzen on the forum (used Intel for Qubes OS for ages). But Ryzen due to its performance looks promising and tempting.
Will use Intel for some time more.
Thanks again, you work with Ryzen is very appreciated.