4.1beta upgrade: stage1 errors

Using qubes-dist-upgrade --update, I am getting 2 kinds of errors:

  • a rather obscure “broken pipe” when updating dom0, but which does not seem to be seen as an error
  • tentative updating of test HVMs which I believe should not even be attempted, but then is seen as an upgrade error

The reactos vm being completely non-functional, a test I should have removed previously, I killed it instead of waiting for a timeout, but removing shows the win10 VMs are enough to cause failure:

Using --skip-standalone-upgrade does not help, it looks like it still wants to update those standalone HVMs.

And how serious is that “broken pipe” ? What additional logs would be useful ?

Looking at the script, I figured out that --skip-standalone-upgrade does not appear to have any effect if --skip-template-upgrade is not specified at the same time.

Trying this I note that --skip-template-upgrade is rejected by the command-line parser although the -j shortcut is accepted (not obvious why). BTW those shortcuts are not included in --help.

The resulting --update -j -k run:

  • shows a new problem, salt-related
  • still tries to update the 3 win10 HVMs, though this time it found enough RAM to really launch one, which I properly shutdown’d, but does not get a better rating than the failed-to-launch ones in the end:

and those HVM logs all look like:

Even if in those win10 HVMs I keep only the one with QWT installed, the updater does not like it (differences li logs: retcode 2 instead of 126, and empty stderr).

Considering myself lucky not to have any use for that last win10 qube, I was able to delete it and stage1 finally did proceed to its end – though still uncomfortable about those “broken pipe” errors in dom0.

Following this I tried to launch stage2, but updating debian10 qubes appears not to work properly:



It does not seem obvious what qube those logs refer to. I thought it would be debian-10-desktop as it is the one for which an error is reported, but given the breakage it could easily be debian-10 instead. It would be nice on such errors to point to the saved logs like we have on other steps – here I could not really locate them and I have to do with the terminal´s scrollback buffer.

The only “error” still in the scrollback buffer is that package qubes-core-agent-qrexec got removed “as requested” by apt. That sounds like a reason for getting into this situation.

Also while looking at the configuration I note that the default net-vm is now shown as “(none)” in the GUI, and I get a “no network” warning when switching to the “firewall rules” tab), though VMs do seem to have network access.

For the debian-10 template I guess I could just throw it away and reinstall the 4.1beta template, but I’m a bit worried about customized ones. I guess there would be a way to start the qube without expecting qrexec to answer, to get a chance to reinstall it, but I can’t find one (I thought running the VM as a HVM would help, but apparently no).

Any idea for nex steps ? Will it even be safe to poweroff this machine at this point ?

Could this issue be linked to the previous stage1 issues ?

Let’s note first that the last screenshot above is just not what I wanted to post, that would rather have shown a screen with the following.

To gather more info, I reinstalled the debian-10 template, and launched bash -x qubes-dist-upgrade --template-standalone-upgrade --only-update debian-10.

We see apt uninstalling a qrexec package and installing a new qrexec package, but not starting the service:

dpkg: qubes-core-agent-qrexec: dependency problems, but removing anyway as you requested:
 qubes-core-agent depends on qubes-core-agent-qrexec; however:
  Package qubes-core-agent-qrexec is to be removed.
...
Setting up qubes-core-qrexec (4.1.13-1+deb10u1) ...
/usr/sbin/policy-rc.d returned 101, not running 'start qubes-qrexec-agent.service'
Setting up qubes-core-agent (4.1.26-1+deb10u1) ...

With just that, at next start this domain aborts the same connection failure as above:

qrexec-start-failed

At first I thought it would be upgrade-template-standalone.sh failing to remove the policy-rc.d script preventing the run of qrexec service, but that one is removed.

Now if I before letting the just-updated VM be shutdown by qubes-dist-upgrade I get a terminal in there and take a look at the service, we can see it did start, but will fail to start again, with “a timeout”:

root@debian-10:~# systemctl |grep qrexec
  qubes-qrexec-agent.service                                              loaded active running   Qubes remote exec agent                                                      
root@debian-10:~# systemctl stop qubes-qrexec-agent.service
root@debian-10:~# systemctl |grep qrexec
root@debian-10:~# systemctl start qubes-qrexec-agent.service
Job for qubes-qrexec-agent.service failed because a timeout was exceeded.
See "systemctl status qubes-qrexec-agent.service" and "journalctl -xe" for details.

root@debian-10:~# systemctl status qubes-qrexec-agent.service
● qubes-qrexec-agent.service - Qubes remote exec agent
   Loaded: loaded (/lib/systemd/system/qubes-qrexec-agent.service; enabled; vendor preset: enabled)
   Active: failed (Result: timeout) since Thu 2021-07-22 18:54:14 EDT; 52s ago
  Process: 25929 ExecStartPre=/bin/sh -c [ -e /dev/xen/evtchn ] || modprobe xen_evtchn (code=exited, status=0/SUCCESS)
  Process: 25930 ExecStart=/usr/lib/qubes/qrexec-agent (code=killed, signal=TERM)
 Main PID: 25930 (code=killed, signal=TERM)

Jul 22 18:52:44 debian-10 systemd[1]: Starting Qubes remote exec agent...
Jul 22 18:54:14 debian-10 systemd[1]: qubes-qrexec-agent.service: Start operation timed out. Terminating.
Jul 22 18:54:14 debian-10 systemd[1]: qubes-qrexec-agent.service: Main process exited, code=killed, status=15/TERM
Jul 22 18:54:14 debian-10 systemd[1]: qubes-qrexec-agent.service: Failed with result 'timeout'.
Jul 22 18:54:14 debian-10 systemd[1]: Failed to start Qubes remote exec agent.
root@debian-10:~# journalctl -xe
-- The job identifier is 5638.
Jul 22 18:52:33 debian-10 systemd[1]: qubes-qrexec-agent.service: Main process exited, code=killed, status=15/TERM
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- An ExecStart= process belonging to unit qubes-qrexec-agent.service has exited.
-- 
-- The process' exit code is 'killed' and its exit status is 15.
Jul 22 18:52:33 debian-10 systemd[1]: qubes-qrexec-agent.service: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- The unit qubes-qrexec-agent.service has successfully entered the 'dead' state.
Jul 22 18:52:33 debian-10 systemd[1]: Stopped Qubes remote exec agent.
-- Subject: A stop job for unit qubes-qrexec-agent.service has finished
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- A stop job for unit qubes-qrexec-agent.service has finished.
-- 
-- The job identifier is 5638 and the job result is done.
Jul 22 18:52:44 debian-10 systemd[1]: Starting Qubes remote exec agent...
-- Subject: A start job for unit qubes-qrexec-agent.service has begun execution
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- A start job for unit qubes-qrexec-agent.service has begun execution.
-- 
-- The job identifier is 5639.
Jul 22 18:54:14 debian-10 systemd[1]: qubes-qrexec-agent.service: Start operation timed out. Terminating.
Jul 22 18:54:14 debian-10 systemd[1]: qubes-qrexec-agent.service: Main process exited, code=killed, status=15/TERM
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- An ExecStart= process belonging to unit qubes-qrexec-agent.service has exited.
-- 
-- The process' exit code is 'killed' and its exit status is 15.
Jul 22 18:54:14 debian-10 systemd[1]: qubes-qrexec-agent.service: Failed with result 'timeout'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- The unit qubes-qrexec-agent.service has entered the 'failed' state with result 'timeout'.
Jul 22 18:54:14 debian-10 systemd[1]: Failed to start Qubes remote exec agent.
-- Subject: A start job for unit qubes-qrexec-agent.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- A start job for unit qubes-qrexec-agent.service has finished with a failure.
-- 
-- The job identifier is 5639 and the job result is failed.

Trying stage2 on the fedora-32 template gives alarming traces without waiting to a VM restart:

  Running scriptlet: qubes-core-qrexec-vm-4.1.13-1.fc32.x86_64          209/209 
Created symlink /etc/systemd/system/qubes-core-agent.service ___ /usr/lib/systemd/system/qubes-qrexec-agent.service.
Created symlink /etc/systemd/system/multi-user.target.wants/qubes-qrexec-agent.service ___ /usr/lib/systemd/system/qubes-qrexec-agent.service.
Job for qubes-qrexec-agent.service failed because a timeout was exceeded.
See "systemctl status qubes-qrexec-agent.service" and "journalctl -xe" for details.
warning: %posttrans(qubes-core-qrexec-vm-4.1.13-1.fc32.x86_64) scriptlet failed, exit status 1

Error in POSTTRANS scriptlet in rpm package qubes-core-qrexec-vm
  Running scriptlet: qubes-core-agent-4.1.26-1.fc32.x86_64              209/209 
  Running scriptlet: qubes-gui-agent-4.1.18-1.fc32.x86_64               209/209 
  Running scriptlet: xen-runtime-4.13.3-1.fc32.x86_64                   209/209 
ls: cannot access '/usr/lib/xen': No such file or directory

  Running scriptlet: qubes-kernel-vm-support-4.1.15-1.fc32.x86_64       209/209 
  Running scriptlet: qubes-db-libs-4.0.16-1.fc32.x86_64                 209/209 

From there the update script exits without shutting the domain down, and I cannot get a terminal to get launched in there for investigation.

My update also failed when updating the templates. Something with Fedora got corrupted so any dependent VM like sys-net won’t start and you lose network connectivity. I had a Debian backup. I restored it, attached all the Fedora dependent vm to it and was able to restore network connectivity. But the migration always fails to update dom0 and the templates. Each time I try, I loop though the corrupted templates issue. So far, I’ve been able to complete all the stages except stage 2.