AppVMs fail to start on 4.2.0 RC-3

Hi everyone,
I’ve installed and used 4.2.0 RC2 on a new framework computer and it worked well (I had some issue with screensaver which seems to be fixed in new versions).
However, when I install RC3, no AppVM would start. The first indication is during installation, where I get an error message about sys-firewall failing due to libxenlight failing to create sys-net. Then when I start Qubes, I can’t see any application in the menu fo any of the appVM, and I get a similar error message when I try to run any appVM. If I had to guess, it would appea that something went wong with the setup of the templates.
I get the same issue when I try the default kernel and when I try latest kernel. I used default settings.
Has anyone else had this problem?

Hi,
Have you changed the kernel version in the AppVM settings?
Quite recently I have seen an error involving the selected empty default value of the kernel version used in the AppVM settings.

I dug in a little deeper.
It doesn’t look like kernel version makes a difference.
However, I found that when I changed the virtualization type to PV it started working…
I found some old reference to something like this in an old version of Qubes PVH doesn't work. PV does. · Issue #5360 · QubesOS/qubes-issues · GitHub
Any idea why this would happen? I didn’t have a problem with RC1 on the same computer without any difference in bios settings.
Thanks!

You can check dom0 logs for errors:

cat /var/log/libvirt/libxl/libxl-driver.log
cat /var/log/xen/console/hypervisor.log
sudo journalctl

Or upload them here so someone can search for any relevant info.

Unfortunately, while the machines for sys-net and sys-usb start in pv mode, I can’t do anything on that computer - I can only get som0 terminal. So, I can’t upload the complete files.
The type of messages I see:

libxl:
… libxl_dm.c:2847 stubdom_xswait_cb: Domain 4: Stubdom 5 for 4 startup: startup times out.

… libxl_pci.c:1585:libxl__deice_pci_reset: The kernel doesn’t support reset from sysfs for PCI device 000:00:0d.0

… libxl_pci.c:2096:do_pci_remove: Domain 7:xc_physdev_unmap_pirq irq=16: Invalid argument

I don’t see anything that look immediately strange on the xen or journalctl logs (but I don’t know what to look for there).

Also: installation with latest kernels halts around the time when it says it is installing the templates. I can only get the installation to complete with default kernel.

I’ll try to find a way to move the complete logs.
Thank you!

You can try to install with kernel-latest but during Initial Setup choose to not setup anything and then manually run Initial Setup from dom0 with:
sudo /usr/libexec/initial-setup/initial-setup-graphical

I’ll try and report back. Is this potentially different from how the setup works?

No, it’s the same.

So, interestingly, when I do this, I get the initial setup window. In the terminal window I see the following warning:
Warning: … : failed to load module nvdimm: libbd_ndimm.so.2: cannot open shared object file: no such file or directory.

When I try to click on the only item there (“QUBES OS”), nothing happens. In the terminal I see the error:
ERR initial-setup: Initial Setup crashed due to unhandled exception:
Trackback…
It makes references to:
/usr/lib64/python3.11/site-packages/pyconda/ui/gui/hubs/__initi__.py line 415 (I think it’s just the click)
/usr/share/anaconda/addons/org_qubes_os_initial_setup/gui/spokes/qubes_os.py line 658 in refresh self.choice_default_template.set_entry(val)
/usr/share/anaconda/addons/org_qubes_os_initial_setup/gui/spokes/qubes_os.py line 215 in set_entry entry_index = self.entries.index(entry)
ValueError: ‘Fedora 38 Xfce’ is not in list.

It strikes me as very strange that I’m the only one experiencing this since it’s a completely fresh installation. It must be something related to the hardware, but I think others installed on framework computers.
Is there any UEFI setting that I could have messed up that could have causes something like this?
Is there anything about a particular choice of SSD that could lead to such a consistent error 8TB m.2 drive)?

Thanks!

I ran a smartctl test on the drive. It passes all the tests. However, I found that it has the ErrCount increases after I try to run the initial-setup and the system halts. Is this a normal result of rebooting after the system halts, or a problem with the drive?
In any case, I am going to get a new drive in the next few days and will try again. I’ll report back.

Thanks!

You can try to change your drive mode from RAID to AHCI or disable VMD in BIOS.

I can’t find either of these in my UEFI configuration… Would there be other names?
I tried to install with a different band new SSD drive. The computer doesn’t halt anymore, but I get the same problem with the appVM failing to start.

What’s your motherboard or BIOS?

It’s the most recent framework 13 laptop (intel).
It’s strange because 4.2.0 RC2 worked. Neither RC3 nor RC4 works. Maybe that’s a clue?

I’m going to try to install RC2 and run an update.

Would it make sense to move this topic to the User Support, or Testing category, rather than Hardware Compatibility List (HCL) since it doesn’t contain a HCL report and/or is about a release candidate version of Qubes OS?

No urgency in my question, I only want to make sure that we eventually leave it in the category where most folks with similar questions are likely to find it.

1 Like

fully agree, moved to General Support

Thank you - sorry about this.

So, to add to the mystery:
I can’t install 4.2.0 RC1 RC3 nor RC4 when I put 64GB RAM in the machine.
But, it does work with 8GB installed. And it still works when I put the 64GB back into the machine after installation.
BIOS ram test doesn’t find any issue and the system seems to be stable.
I suppose this resolves the problem? But isn’t this strange? Is there any further RAM test I should run to see that I don’t have any hidden hardware problem?

I’m running into the same errors when I install without any VM setup and then use the graphical install, which I need to do since I have to disable a PCI device before. Installing everything directly leaves me with a system with quite some errors in the journal: Installer corrupts installation medium (+ bounty for help) - #5 by Minio

Have you used the built-in verification when installing your system? Did it fail by any chance?
In your journal, do you see any interesting red errors such as faulty PM modules on sudo?

I don’t have the logs for this anymore, but I feel like I tried everything, including installing the VMs later. Pretty much same results.
Have you tried to put smaller RAM modules in the machine? It is very strange, but this is the one thing that worked. I put 8GB RAM in the machine, installed everything smoothly, and then put the 64GB RAM back and have been using it smoothly ever since.
The only reason I tried this is that I happened to install RC1 with 8GB RAM before installing RC4 with 64GB RAM. so I tried to reproduce what I had originally done. I wouldn’t have thought of trying it otherwise.