Installer corrupts installation medium (+ bounty for help)

You can try to add pci=nomsi to the qube’s kernelopts with qvm-prefs in dom0 terminal:
s=$(qvm-prefs -- sys-net kernelopts) && qvm-prefs -- sys-net kernelopts "pci=nomsi $s"

If it won’t work then try to add no-strict-reset=true option to your PCI device in sys-net.
If it won’t work then try to add permissive=true option to your PCI device in sys-net instead of no-strict-reset=true option.
If it won’t work as well then try to add both no-strict-reset=true and permissive=true options to your PCI device in sys-net.

Maybe your issue is because of faulty device kernel module which is causing the sys-net crash.

Which new device are you using? Did you try to find it on this forum and check if there are any hints?

Still happy about every reply that might lead me to a new line of thought. Thanks!

In regard to the initial setup failing/sys-net not starting because of the PCI device (both 4.1 and 4.2-rc5): I solved this with the steps from my initial post when I installed 4.1 again last night to see if I can get this version to work in a way that allows me to quickly prepare some work while I keep figuring this out. I installed 4.1 (no issue with install medium verification still, only 4.2-rc5), didn’t setup any VM, booted, ran echo -n "1" > /sys/bus/pci/devices/<device>/remove as root and then ran /usr/libexec/initial-setup/initial-setup-graphical to run the setup without the PCI device interfering. I don’t think I remember any error while following logs when it ran. I didn’t have much time to work with it but the system looked okay to me. A few errors from different log files and issues I noticed:

Error logs/issues from/in 4.1

error calling qrexec-policy-agent in dom0 <…>
Didn’t happen on most recent reboots. I think this happens when I boot, can’t allow the mouse with my keyboard because anon-whonix setup takes over the focus and I wasn’t able to refocus the form for allowing the mouse, which led me to plugging it back in. I guess confirming it twice with both forms then does this. I now use i3 and I can easily get the focus back to the form so that would make sense.

systemd-vconsole-setup: /usr/bin/setfont failed with exit status 71
I don’t think this is something to be concerned of, noise.

usbguard-daemon: Device insert exception <...> Protocol error
kernel: usb 1-5: can't set config #1, error -71
kernel: usb usb1-port5: disabled by hub (EMI?) re-enabling
I think this might be related to the USB-C port not being supported. Nothing to be concerned of too I’m assuming.

libvirtd: internal error: Unable to reset PCI device 000:00:<ID>.0: no FLR, PM reset or bus reset available
Not too concerned here for now. Should I be?

libvirtd: internal error: Unknown PCI header type '127' for device '0000:0<device>:00.0'
Again, I think this is the WIFI not being supported which would be the fixed issue with sys-net. I didn’t disable the PCI device permanently, which might resolve this altogether as I don’t even need it.

systemd-modules-load: Failed to insert module 'xen_acpi_processor': No such device
Not sure what to comment on this.

Moving windows and hovering over some buttons was stuttering very much without i3 and nouveau.noaccel=1. Similar experience on my old device. With i3 and the kernel parameter, I think it will be no issue.

<several USB devices> client bug: event processing lagging behind by 15ms, your system is too slow
Saw this in other threads related to moving windows. After installing i3 and increasing RAM in sys-usb and other sys-* VMs, I didn’t see this anymore yet.

I think the system does look pretty much good like this. Please correct me if I’m wrong. So 4.1 with a few tricks seems okay at this early stage so far.

I gave Google a try before posting. Will check more in-depth another time including the forum later on.

Mainboard: B760 chipset, 32 GB DDR5 on it
CPU: i5 13th generation
(GPU: GeForce GT 1030)
NV2 NVMe M.2 SSD

Now I will dd 4.2-rc5 on a different USB stick again and install it the very same way I described and did with 4.1 to see how that goes. I will use kernel-latest though. Will document issues and things I notice here in the same manner.
If there is anything that helps you helping me, I’m more than happy to grab the information.

Edit:
I couldn’t even finish the 4.2-rc5 installation with /usr/libexec/initial-setup/initial-setup-graphical. Same errors as described by a different user: AppVMs fail to start on 4.2.0 RC-3 - #9 by b34

4.2-rc5 errors finishing install with /usr/libexec/initial-setup/initial-setup-grahic

failed to load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: No such file or directory
/usr/lib64/python3.11/site-packages/pyconda/ui/gui/hubs/initi.py line 415 in _on_spoke_clicked
/usr/share/anaconda/addons/org_qubes_os_initial_setup/gui/spokes/qubes_os.py line 658 in refresh self.choice_default_template.set_entry(val)
/usr/share/anaconda/addons/org_qubes_os_initial_setup/gui/spokes/qubes_os.py line 215 in set_entry entry_index = self.entries.index(entry)
ValueError: ‘Fedora 38 Xfce’ is not in list.

Will try one more time and then without kernel-latest.
Interesting enough the installation medium in the boot menu now boots the installed Qubes 4.2 from the NVMe and lsblk shows <device> 8:32 1 0B 0 disk.

The other thing that others, and myself have done:

Install Qubes 4.2 RC2 or the daily from early August 2023, and then just update.

I do not know if it works with the 13th generation processor.

Thanks, I was at least hopeful for a moment. Unfortunately built-in verification failed and same amount of errors in the journal etc. (faulty PAM modules on sudo & more, will document these later here) as with RC4 when ignoring the verification and jumping to the installation + boot.
Used RC2 not a daily to be specific.

I now need sleep. Running MemTest86 in the background just for some new information to be passively generated while I finally rest for a moment. 2 passes done without errors so far. Not surprising but its something.
Raising the bounty pool to 200 dollars for an individual or combined for a group of people involved as I can not postpone work too much longer and I would be sad to move away from Qubes when 4.1 becomes unsupported. This really is my hard limit financially though. Appreciate all of your time!

Did you verify the downloaded ISO?

I did the same (expect checking if the installer corrupted the installation medium again) for all of the countless attempts. I either freshly download the .iso or used an .iso I still have on some other drive, use sha256sum, dd it to my installation USB (exactly as described in the docs including conv=fsync) and then even verify the installation medium by using dd to pipe the newly created USB to sha256sum also in the very same way described in the docs. I boot from the double verified USB and the built-in verification fails. In my first attempts, I then reverified the USB afterwards with dd and sha256sum and the sum has actually changed. This happens with 4.2.0-rc5 and 4.2.0-rc2. As you can read above, 4.1 doesn’t have this issue nor this one: AppVMs fail to start on 4.2.0 RC-3 - #9 by b34

Did you maybe inserted this USB in Windows? Or maybe some other software modified the files in ESP there?

No Windows on that device, USB never removed after dd. I verify the .iso, write the .iso with dd, verify it another time on the USB with sha256sum and reboot right after to boot the installer.

If some software would do this somewhere in between this process, which I highly doubt, I think this should happen with both 4.1 and the 4.2 RCs, but it only happens with 4.2 RCs. Or am I wrong?

Edit: A few more parts in this puzzle: I used multiple USBs and updated my BIOS.

These errors are unrelated and harmless:

1 Like

Thank you, I missed this one. This scratches one problem off the list of things I recall on the booted 4.2 installation.

The thread is getting messy with information all over the place, I will try to summarize the current state and will update this post here with problems so we have one dedicated place for it.

Problem list

4.1: Seems to work okay at a first glance when not installing full system right away but doing the graphical install after disabling a problematic PCI device after boot in dom0. See “Error logs/issues from/in 4.1” here: Installer corrupts installation medium (+ bounty for help) - #5 by Minio
(Maybe I just fail to recognize a real problem in these logs though)

4.2-(rc5): A fully verified USB install medium fails built-in verification when installing. After booting into a different system to check the sha256sum of the drive, it did really change. I could not reproduce this anyhow with a reboot, detaching and attaching the drive or anything. It appears to only happen when booting the 4.2 installer/verification.
Just installing the system nevertheless without doing a full installation just like I did with 4.1 in order to run /usr/libexec/initial-setup/initial-setup-graphical after disabling the problematic PCI device fails, see “4.2-rc5 errors finishing install with /usr/libexec/initial-setup/initial-setup-grahic” here and here.
(to be documented in detail, see below)

__________________________________________

What I will do now is creating yet another 4.2-rc5 installation medium and find a way to disable the PCI device during the installation already as the initial-setup-graphic doesn’t work after boot. It would be interesting to see if the installation process itself is fully done then without any error being displayed to me. If I get this to work, I will document the errors I find on the booted system as I did with 4.1.
One additional issue I recall is sudo qubesctl state.highstate giving No Top file or master_tops data matches found. Please see master log for details, which was not an issue on 4.1 too.

That didn’t work quite as planned.

Can I even disable my PCI device for the initial setup of VMs right after installation in the default flow (not with the initial-setup-graphic script)? There is no TTY: Can't enter into TTY from the initial setup

I booted the 4.2-rc5 system without the setup and tried digging into the issue when running initial-setup-graphic (as a reminder: ValueError: ‘Fedora 38 Xfce’ is not in list.). There was this error around NVDIMM (failed to load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: No such file or directory), which others had too. I found that libblockdev-nvdimm is not installed, should it possibly be? Unfortunately I can’t install it without the VMs as per dom0 design.

In all these attempts, I found that the initial setup breaks the initial setup. What I mean with this is that if I quit the initial setup (literally with the Quit button, see below) and boot the system, the system is pretty broken, can’t use the Q icon and a few features because I think the system expects the host to have dom0 as its name, but it is localhost. It complains about a missing key localhost in a python script for the menu. However, disabling the PCI device and then running the initial setup script works. So the initial setup breaks my chance of running it later when I select that I would like to setup nothing at this point which I absolutely need with my unsupported PCI device though. A quit before entering the submenu with the settings doesn’t do this.
Would this approach result in a usable system? I hope this helps debugging the issue with the initial setup at least.

What if you choose to not create system qubes (sys-net etc) during Initial Setup and then create them manually after Initial Setup configure your system and install the templates?

As I detailed in my previous response, this works and doesn’t work at the same time. If I choose not setting up anything, I can not access the configuration screen of the initial setup after boot because of above error.
If I choose “Quit” however, I can. Something in the initial setup that is being executed, even when not setting up any VMs is selected, breaks stuff in a way that it can’t access the configuration screen again.

I’m now using the system I setup with choosing “Quit”, disabling my PCI device and then running initial setup.

Currently digging through errors:

4.2-rc5 kernel-latest dom0 errors and notes with 'Quit' initial setup workaround (have not setup anything yet, will do in the next step)

ACPI BIOS Warning (bug): Incorrect checksum in table [BGRT] - 0x1A, should be 0x0F
I do have the most recent BIOS version though.

PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug

pci 0000:00:1f.5: can't claim BAR 0 [mem ...]: no compatible bridge window
pci 0000:00:15.<0-3>: can't claim BAR 0 [mem ...]: no compatible bridge window

pnp 00:05 disabling [mem ...] because it overlaps 0000:00:02:0 BAR 9 [mem ...]
system 00:05: [mem ...] could not be reserved
pci: 0000:00:02.0: BAR 9: no space for [mem ...]
pci: 0000:00:02.0: BAR 9: failed to assign [mem size ...]

Some PCI device resources are unassigned, try booting with pci=realloc

I think these are all related so I put them there together.

hpet_acpi_add: no address or irqs in _CRS

pcieport 0000:00:1c.0: retraining failed
pcieport 0000:00:1c.0: broken device, retraining non-functional downstream link at 2.5GT/s

systemd-modules-load: Failed to insert module 'xen_acpi_processor': No such device

pciback 0000:02:00.0: not ready <time>ms after bus reset; waiting/giving up
This is what delays the boot I think. It is the same PCI device I want to disable anyways for breaking sys-net, so I’m not concerned about this. I think it is unsupported WIFI as I mentioned before.

internal error: Unknown PCI header type '127' for device '0000:02:00.0'
See posts with errors on 4.1 and previous paragraph, should be the WIFI.

Other than that, I think there is a few errors around sound (doesn’t work) and WIFI (some certificates, pulseaudio). I don’t need either and I would assume with proper drivers and some work they might be able to run somehow.

0000:00:1c.0 and the missing xen_acpi_processor are what stands out to me, but I’m not very experienced with this.

I think creating separate issues on GitHub for the clear individual problems identified so far (initial setup and verification failure) in the meantime would be helpful to keep things organized.

Now I will try the suggested kernel parameter in one of the errors and remove my graphics card. It isn’t really needed anyways.

You can create all qubes manually without running Initial Setup again (which shouldn’t be called second time by design).

In my limited experience, and somewhat limited technical knowledge. I would say. Solve the problem with USB install from - not passing self check before going further.

I have no knowledge which Linux Kernel is needed for a 13th generation. But you seemed to say you had Qubes 4.1 working on this particular computer.

Are you using a machine with the intention of doing a dual boot?
or perhaps just having more than one drive inside the computer?

Having two drives has, in my case, shown to be problematic. I do not have problems using a legacy boot with two drives.

There are posts about re-doing the boot on the hard drive, and issues with UEFI. but I know nothing.

Using a high speed late model computer (13th generation, hot graphics card) can be problematic. Since Qubes is intended for security, it is easier to be more certain of older hardware, about which much is well known.

Even if I solved your problem, and this is only suggestions of where to go next. I would say to give any money to the Qubes Developers. or is it the Invisible things website.

Like many others I am desperate for money, but my Social Security (US old age pension) keeps a roof over my head, and food to eat. Whereas some here seek bounties to accomplish some of that.

Cheers.

I’m currently working on setting up my VMs. The only issues so far are a few errors in the logs, which I mentioned before. Unfortunately I can not determine if there is some critical problem amongst them. Maybe some kind of Rule It Out list of log errors with a short explanation what this might generally mean and why it therefore is not a big problem would be good addition for the docs. Sometimes Google does not have much to tell about them. I have seen a few threads with people having questions about errors and maintainers saying that logs were unrelated on GitHub. This could also be turned into a little command line tool to remove these lines from log files. Inspired by GreyNoise RIOT.

An additional error is qmemman reporting domain '(number)' still hold more memory than have assigned. I see very few things about this on GitHub. One report was related to issues starting a VM with a high amount of RAM assigned. I did it and I don’t have any problem I could pinpoint. Also I think qmemman is checking in on the RAM quite often if I am not mistaken and the error is being reported one single time once in a while (mostly on VM starts) and not spammed without end, so I would conclude this might not be a real issue. My fingers are crossed that I will not run into any real issues after setting up my remaining VMs and using them for a couple of weeks.

Maybe someone sees this thread looking for a particular error or word mentioned here, so I would like to document something that took more work than it needs to in my mind: Setting up the system without the initial setup GUI while not clicking ‘Quit’ and opening it again after booting. I don’t think I saw all the salts in the documentation to accomplish this.

Setting up default VMs with salt
The interesting files for this are:

The first file shows you which condition must be met for the tasks in the second file to be added to queue and run. I setup a system with only the templates being downloaded and the VM pool being created by the initial setup. While the above files are code, I think it is easy to follow along to setup your desired configuration.

After tasks = [], look for tasks being added to the list and the condition for this. In the other file, you will pretty much see the exact commands being executed.
The DefaultKernelTask is always being executed and I have my pool already, so I skipped these. Same for the templates. @Query this task is for you though: InstallTemplateTask. You can also just use the Qubes Template Manager GUI in 4.2 or read the documentation for this: Templates | Qubes OS

Skipping CleanTemplatePkgsTask and ConfigureDom0Task as these are always executed. If you fail at a very early stage though, you obviously would want to execute these too. It might make sense to just read through them all and verify the changes were made.

Execute command in SetDefaultTemplateTask, self-explaining again.

ConfigureDefaultQubesTask is slightly more complicated and depends on your desired outcome. Begin by running qubesctl saltutil.clear_cache and qubesctl saltutil.sync_all. Then (in the code) salt configurations will be added to the states list depending on the checkbox in the installer. I think the conditions are mostly self-explaining. There is a bit more code involved here now.
If qvm.sys-net is being added to the list, you would execute qubesctl top.enable qvm.sys-net. If the entry added to the states list is prefixed with pillar., you remove this and execute the command I just listed and append pillar=True to it. Let’s say you want a disposable sys-net VM and therefore see pillar.qvm.disposable-sys-firewall being added to the state list, the command would become qubesctl top.enable qvm.disposable-sys-firewall pillar=True. Afterwards, let salt configure all these enabled states by executing qubesctl --all state.highstate.
You then should disable all the states that you did not suffix with pillar=True again by executing qubesctl top.disable followed by the state.

After that, just follow along with the code and execute the commands. Now you have manually setup the default Qubes configuration. Again, I didn’t see quite a few of these anywhere in the docs. I think this is quite important as you can not setup Qubes with a problematic PCI device without this if I am not mistaken.

By the way, I think this check is incomplete: https://github.com/QubesOS/qubes-anaconda-addon/blob/4bdce79a6bcb28eff844601aebdb000dc8fc0019/org_qubes_os_initial_setup/service/tasks.py#L308C17-L308C33

If you don’t setup any default VM as I did, there is no default-dvm so this fails. The check should also include a condition making sure the first checkbox for default VMs including the default disposable VM is checked. While I think it doesn’t matter as the next steps are mostly related to the missing default Qubes, it is very irritating considering that users are looking for a bit of a peace of mind when using a security OS like this and ruling out errors displayed to you as sources for potential harm to that can take quite a bit of time if you have the technical means to do so at all.

@catacombs
No dual boot, I just have the old drive in the computer to copy over data from the previous installation from a different device once time has come.

I am new to Qubes 4.x and running it on old enterprise hardware. The only thing I have to add here are:

  • Isn’t it normal to provide the qubes-hcl-report for your machine when troubleshooting at this level?
  • Is there a standard command or set of commands to provide similar information about the machine’s storage?

It would be good if this functionality were included right in the Qubes installer - like an option to automatically pipe the output to a short term pastebin so the person doing the install can just share that URL as part of the support request.

I think yes: The HCL entry will have a link to this discussion and will not in itself contain anything related to your problems AFAIK.

2 posts were split to a new topic: Invoke basic qubes without internet (sys-net)