Some issues with 4.1.1 updates and kernel-latest

I have reports of various issues with 4.1.1 and latest updates,
including kernel-latest(5.19.9-1)
These are issues with at least 5 reports:

  1. Qubes running very hot - far hotter than 4.0
  2. Qubes running with increased RAM usage - for example, on x220/230 I had dom0 maxed
    at 1536M, service qubes at 3/400M, and could comfortably run 14+ qubes
    in 16GB RAM.
    Now I have had to push up dom0 allocation, push up per qube
    allocation, can run far fewer qubes without interruption, and regularly
    have qubes failing to start. I have reports from users who have not
    customised memory allocation
  3. Random freezes and hard crashes - these have been reported in the
    Forum generally. Usually there is no relevant information in the
    logs.
  4. Randomly qubes fail to start.
  5. Issues with sleep - sleep used to work flawlessly. If sys-net was
    left running, network connections were re-established and the Qubes
    network stack worked.
    Now, sys-net will reconnect, but downstream netvms become unusable.
    They provide no network access, and it is not possible to open a
    terminal using qvm-run xterm - the command just hangs.
    It is not possible to restart the qube - the logs contain only
    "libvirtd: internal error: libxenlight failed to create new domain "

These issues affect machines with stock BIOS and coreboot, and some
certified machines.
I think that issue 5 is specific to kernel-latest.

I’d like to hear if you recognise any of these, and any thoughts on how
to proceed before raising issues at GitHub.

3 Likes

I recognize (2) and (3) on ThinkPad T430 with coreboot/heads (equivalent of NitroPad / certified).

2: had to increase dom0 to 2048 from 1536 to avoid performance penalty
3: semi-regular freezes during updates, very rare freezes otherwise (3 times in several months)

The freeze issues started for me back in R4.0 with 5.x kernels, which is why I stuck with the 4.x kernel.

R4.0 with 4.x kernel was an extremely stable and performant system. With R4.1 things are generally slower and less stable. I’ve considered downgrading recently but then R4.0 is EOL.

2 Likes

I recognise 2 and commented on upstream issues. Fedora consumes lot more ram then before and requires either switching to Debian or sys-net and sys-usb to have increased memory allocated otherwise camera and wifi stops working. This is changed in OEM disk image which is probably why less reports on privacybeast but issues is known and commented on reports for camera, sys-net having drivers faults and loss of networking
Relevant Qubes issue linking to specific forum discussions:

I recognise 3 when updating templates.

I recognize 4 for 1/3 of sys-usb boot when usb thumb drives are connected prior of sys-usb booting. sys-usb sometimes down in resume as well, or not started on boot when thumb drives connected…

5 is known, diagnosed the problem with @adrelanos. Linked to tsc and xen, time keeping and suspend/resume mishandling. Poked @demi for pull request to produce package to test into unstable. (unrelated with tor bug on resume from suspend though, which can be experienced for anybody suspending for a long time and sdwdate reporting timeout).


Of course, all of the above reported from a x230 i7 and latest Heads firmware.
Will watch this thread and contribute with diagnostics vs stock config and report on tuning applied if needed.

Never test kernel-latest, though, unless really needed from GitHub - QubesOS/updates-status: Track packages in testing repository messages reaching my attention.

2 Likes

@deeplow: Maybe adding this requirement to be part of the testing group:

“Users enrolling to testing section of the forum are required to subscribe to GitHub - QubesOS/updates-status: Track packages in testing repository notifications to be aware of packages being released for testing purposes” ?

@unman: I remember you read forum content from emails sent to you. Wanted to poke you because once again (sorry) my post was highly edited and is now containing what I wanted you to read.

I’ll ping @unman privately about this. Thanks for the suggestion.

The only problem I have in my machine, debian vm have screen flickering.
Looks like kernel issue.

So testing for resolution of

  1. Qubes running very hot - far hotter than 4.0 (cpu/clock issues)
  2. Randomly qubes fail to start. (sys-usb here and clock issues)
  3. Issues with sleep/resume

Might be resolved through latest xmm-xen/kernels having landed under current-testing repos.
My attempt to have better testing process is under What to test? Where to get what to test? Where to report testing results?

But more specifically, for the xmm-xen testing at play: Xen + xmm-xen fixes to test (suspend/resume fixes, directio loopback devices, sys-usb fails to start, slowness vs bare metal)

Seems gone. Booting sys-usb on boot with my Thumb drive and USB Security dongle worked. Restarting sys-usb: boots fine.

@unman : I think other issues you point at, if linked to timeout at start, should be detailed as such from a observable perspective.

I can only report a similar issue that affected me for a month or so (but seems to have been resolved), namely that after a reboot, sys-net wouldn’t actually work to provide networking until I restarted it about 2 minutes in. But no hangs, refusal to reboot the vm. My guess is that this was due to a kernel issue, as we’ve not seen many other updates aside from all of the xen vuln patches (but most of those came later, iirc).

Can’t comment on the heat issue though I don’t think I have heard my fans spin up (but hard to tell since it’s a pretty silent fan on a large cooling block) and I just allocate 4gb to dom0.

hardware: ryzen 2700x, x570 motherboard, 64gb ram.