Choose to patch the python code to use explicit task.
Was able to create the iso, run the installer (non-efi), and boot into qubes.
dom0 is fedora 37, xen 4.16.2, it seems to work.
However, a lot of work is still needed:
need to update anaconda patch
some configuration is needed to make the installer work in efi mode
need to update the vmm-xen patches for debian vm
Once it is done and confirmed that everything work, will need a lot of cleanup work in the commits and try to make this qubes os fork as easy to keep updated with upstream as possible
and then work with upstream to upgrade everything for a new future release.
In the current state it is only to show the progress I am making and for devs willing to help finishing this project. Some work is still needed before reaching a “usable” state, and lot of work to clean up everything once the “usable” state have been reached
Great work and progress neowutran, thanks for updating along the way.
I tried for another 2 days, attempting integrate 4.16 also.
You’ve come pretty far, well done. How stable is the build you’ve managed to boot into?
iirc python 3.11 is a major change under the hood, I saw some benchmarks of 50% performance gains in certain tasks (i think more C implementations), so not too surprised there’s some issues there.
What kernel did you use for dom0 build?
I think a lot of these AM5 boards have issues just now. Have you tried adding x2apic=false for the iommu issue? This seemed to help in some of my tests.
Currently my main difficulty is patching anaconda for the partitioning setup. A lot have changed between the fedora 37 anaconda and the fedora 32 anaconda.
Patch need to be rewritten from scratch. ( speaking about ~3 or 4 blocking patch of less than 100 lines combined. But that still some work to do. i won’t say no to help on that subject).
Once anaconda have been patched, then I expect everything to work correctly ( well, just the fedora vm, didn’t ported the patch for debian vm )
For the python3.11, issue have been fixed, explicitly using tasks instead of coroutine seems to be quite simple and basically, you just need to hunt for “wait(XXXX…)” and check that no coroutine is passed inside a “wait”.
“How stable is the build you’ve managed to boot into?”
Well, dom0 doesn’t crash and I see no error in dom0. Other than that, cannot be used because of the partitioning issue and thin pool name (that are normally setup by anaconda, but haven’t ported the required patch)
Update: qmemman is crashing (core dump). https://github.com/QubesOS/qubes-linux-utils/blob/master/qmemman/meminfo-writer.c Xen api have probably changed, need to read the 4.16 doc and update the calls
For the AM5 motherboard, haven’t tried any additional options. I just have contacted the asus support, now I need to fill a detailled technical IOMMU bug report for their engineering team.
When you have time, can you try the iso (or build it yourself) to check that you can finish to install the thing and reach dom0 ?
You may be able to temporarily work around that by setting dom0 min/mem values equal on your startup command line and (once domU VMs can launch) disabling memory sharing for all of them (temporarily).
Struggling with the anaconda addons, I need to find a way to modify and test it quickly (recompiling the iso after each modification is … ), but documentation is lacking on the testing part.
For the qmemman crash settings dom0 min=max value doesn’t seems to have a noticable impact. For the core dump it seems (a lot more debugging is needed to confirm) to crash here xen/tools/python/xen/lowlevel/xs/xs.c at master · xen-project/xen · GitHub . To support python 3.11 I need to add a patch to add “PY_SSIZE_T_CLEAN”.
From the documentation:
For all # variants of formats (s#, y#, etc.), the macro PY_SSIZE_T_CLEAN must be defined before including Python.h. On Python 3.9 and older, the type of the length argument is Py_ssize_t if the PY_SSIZE_T_CLEAN macro is defined, or int otherwise.
qmemman is fixed now (normally), it was indeed my patch that was incorrect.
So some progress.
Now when I start a new VM, vm is unresponsible, no way to send command to it.
Logs indicate that multiples things went wrong
Something about clock / timer (xen_timer / hrtimer). TSC clocksource doesn’t seems to work. “Marking clocksource ‘tsc’ as unstable because the skew is too large …” “Override clocksource tsc is unstable and not HRT compatible - cannot switch while in HRT/NOHZ mode” “Switched to clocksource xen”. Then lot more logs/trace about issue with clocksource.
Then it crash with things related to disk “Qubes initramfs script here:” … “/dev/xvdd: Can’t open blockdev” “EXT4-fs (xvdd): mounting ext3 file system using the ext4 subsystem”. Some more logs and nothing else
Now need to understand what is means.
The file system errors happens on standard QubesOs, so will focus on the clocksource issue
VM react when passing command through sudo xl -v console fc37, but everything is spectaculary slow.
After a very long time, qrexec is working too.
On the clock issue TSC is detecting my clock speed as ~200Mhz, while the correct value is 4500Mhz+.
Was able to launch a xterm windows, but speed is not good
Continuing to make some progress.
The TSC issue was hardware related, with another computer I doesn’t have the issue. Still waiting for a reply from the asus engineering department.
Next issue is PCI passthrough.
When I try to pass the network card to sys-net, libxl crash with the following message:
“libxl_qmp.c:1838:qmp_ev_parse_error_messages: Domain 4:Offset 0x000e:0x49090000 expands past register size (1)
libxl_pci.c:1830:device_pci_add_done: Domain 4: libxl__device_pci_add failed for PCI device 0:1:0 (rc -28)
libxl_create.c:1973:domcreate_attach_devices: Domain 4:unable to add pci devices”
Same error but when using PV instead of HVM:
“xen_pt_config_reg_init: Offset 0x000e mismatch! Emulated=0x0080, host=0x49090000, syncing to 0x49090000”
Upgraded the stubdom dependencies to the latest available.
I am now encoutering a Out Of Memory issue when I try to start any HVM or PV.
From the logs it seems it OOM when trying to copy “rootfs”.
My “rootfs” is much larger than the original qubes stubdom-linux (around 2 time bigger).
The reason it is bigger is that a lot of options have been added to qemu since the last upgrade.
I could disable them to reduce the size of rootfs (I suspect the issue is that the scripts initializing the rootfs is failing because the rootfs is too big), but not sure it is a good idea (I am still learning, or trying to learn how xen and qemu are interacting with each other. Many new qemu options seems interesting by their name (vfio/rdma/avx2/…), but I do not understand yet if it have an impact of if somehow all of that is passed to xen and xen doing all this work).
So for the moment, looking how/if it is possible to use a big rootfs (77Mo uncompressed. Compared to the 32Mo uncompressed of the original qubesos rootfs)
The issue was indeed that rootfs was too big. Solution was to strip the embedded binary. Now new issues, still not able to launch a HVM or PV after the stubdom upgrade
When enabling this patch it crash with the error I previously posted “Domain 4:Offset 0x000e:0x49090000 expands past register size (1)”, “xen_pt_config_reg_init: Offset 0x000e mismatch! Emulated=0x0080, host=0x49090000, syncing to 0x49090000”.
Still no idea of what is the solution to fix this issue, but “what is the issue” seems a bit clearer to me.
From the log a difference seems to appear between standard qubes & new xen.
The flag “PCI_BASE_ADDRESS_MEM_TYPE_64 0x04 " seems to be used. ( I see the type 0x04 in my custom build while on standard qubes os it seems to use PCI_BASE_ADDRESS_MEM_TYPE_32”. To be confirmed. Still no idea on what it means for the fix I need to do.
Update: This specific issue is fixed, I made some mistakes when upgrading the rpm spec for qubes-vmm-xen. Pci passthrough still doesn’t work, but it crash a bit later in the initialization steps. Speaking about “rdm check flag”, will try to learn what it is
Major update:
The libvirt error message was a bit misleading.
However the xen error message was quite explicit and directly suggested me to try to set the “permissive” attribute.
I posted this message from my custom qubes build, with xen 4.16.2, libvirt 8.9.0, qemu 7.1.
( A lot more work is still required: testing, a lot of testing. Cleaning the code, trying to reduce the size of the diff between my fork and the official qubes os. Rewriting the git commit history (don’t look at it, it was my try&die workflow ), and many other thing. But now I am certain that I will make it work as I want).
Marmarek recently submitted a PR to QubesOS/qubes-vmm-xen at github. The PR upgrades Xen version to 4.17-rc3, which I think is what next release of QubesOS will rely on.