[qubes-users] resume from suspend issue after QSB-070

After installing yesterday’s patches, my laptop cannot resume from sleep.

I have gone through the /sys/power/pm_test procedure to see anything interesting.

Only this when doing the ‘core’ test:

[ 2330.899224] ------------[ cut here ]------------
[ 2330.899225] WARNING: CPU: 1 PID: 0 at arch/x86/mm/tlb.c:456 switch_mm_irqs_off+0x375/0x3a0
[ 2330.899230] Modules linked in: snd_seq_dummy snd_hrtimer loop ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter vfat fat rmi_smbus rmi_core snd_soc_skl_hda_dsp snd_soc_hdac_hdmi snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic snd_sof_pci_intel_cnl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof snd_sof_xtensa_dsp snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_acpi_intel_match snd_soc_acpi uvcvideo iTCO_wdt intel_pmc_bxt ee1004 iTCO_vendor_support videobuf2_vmalloc snd_soc_core videobuf2_memops videobuf2_v4l2 videobuf2_common intel_wmi_thunderbolt snd_compress wmi_bmof intel_rapl_msr snd_pcm_dmaengine ac97_bus videodev snd_hda_intel intel_powerclamp mc snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device joydev pcspkr snd_pcm i2c_i801 i2c_smbus ucsi_acpi typec_ucsi snd_timer typec wmi
[ 2330.899259] thinkpad_acpi platform_profile ledtrig_audio snd soundcore iwlwifi processor_thermal_device processor_thermal_rfim processor_thermal_mbox processor_thermal_rapl intel_rapl_common cfg80211 intel_pch_thermal intel_soc_dts_iosf intel_hid int3400_thermal sparse_keymap acpi_thermal_rel int3403_thermal thunderbolt int340x_thermal_zone rfkill fuse xenfs ip_tables dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt trusted crct10dif_pclmul crc32_pclmul crc32c_intel nvme i915 xhci_pci xhci_pci_renesas xhci_hcd ghash_clmulni_intel nvme_core i2c_algo_bit drm_kms_helper cec serio_raw drm video pinctrl_cannonlake xen_acpi_processor xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
[ 2330.899280] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.12.14-1.fc32.qubes.x86_64 #1
[ 2330.899281] Hardware name: LENOVO 20S00044UK/20S00044UK, BIOS N2XET31W (1.21 ) 06/17/2021
[ 2330.899282] RIP: e030:switch_mm_irqs_off+0x375/0x3a0
[ 2330.899285] Code: 00 00 65 48 89 05 33 87 fa 7e e9 7e fd ff ff b9 49 00 00 00 b8 01 00 00 00 31 d2 0f 30 e9 5e fd ff ff 41 89 f7 e9 a1 fe ff ff <0f> 0b e8 54 fa ff ff e9 fe fc ff ff 0f 0b e9 49 fe ff ff 0f 0b e9
[ 2330.899286] RSP: e02b:ffffc90040103eb8 EFLAGS: 00010006
[ 2330.899287] RAX: 000000010cbae000 RBX: ffff8881002d2780 RCX: 0000000000000040
[ 2330.899288] RDX: ffff8881002d2780 RSI: 0000000000000000 RDI: ffff88818cbae000
[ 2330.899289] RBP: ffffffff829d7160 R08: 0000000000000000 R09: 0000000000000004
[ 2330.899289] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888100bfee80
[ 2330.899290] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
[ 2330.899294] FS: 0000000000000000(0000) GS:ffff88816bc40000(0000) knlGS:0000000000000000
[ 2330.899295] CS: 10000e030 DS: 002b ES: 002b CR0: 0000000080050033
[ 2330.899296] CR2: 00007f2c50011766 CR3: 0000000002810000 CR4: 0000000000050660
[ 2330.899298] Call Trace:
[ 2330.899302] switch_mm+0x1c/0x30
[ 2330.899304] play_dead_common+0xa/0x20
[ 2330.899323] xen_pv_play_dead+0xa/0x60
[ 2330.899325] do_idle+0xc7/0xe0
[ 2330.899327] cpu_startup_entry+0x19/0x20
[ 2330.899329] asm_cpu_bringup_and_idle+0x5/0x1000
[ 2330.899332] —[ end trace a14329e4cbb028c0 ]—
[ 2331.001403] smpboot: CPU 1 is now offline
[ 2331.003746] smpboot: CPU 2 is now offline
[ 2331.007186] smpboot: CPU 3 is now offline
[ 2331.013292] PM: suspend debug: Waiting for 5 second(s).
[ 2336.013713] xen_acpi_processor: Uploading Xen processor PM info
[ 2336.013724] xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU5
[ 2336.013726] xen_acpi_processor: (PXX): Hypervisor error (-19) for ACPI CPU6
[ 2336.013727] xen_acpi_processor: (PXX): Hypervisor error (-19) for ACPI CPU7
[ 2336.013728] xen_acpi_processor: (PXX): Hypervisor error (-19) for ACPI CPU8
[ 2336.013736] Enabling non-boot CPUs …
[ 2336.013740] installing Xen timer for CPU 1
[ 2336.014046] cpu 1 spinlock event irq 131
[ 2336.014279] ACPI: _SB
.PR01: Found 3 idle states
[ 2336.014526] CPU1 is up
[ 2336.014531] installing Xen timer for CPU 2
[ 2336.014758] cpu 2 spinlock event irq 137
[ 2336.014949] ACPI: _SB
.PR02: Found 3 idle states
[ 2336.015018] CPU2 is up
[ 2336.015021] installing Xen timer for CPU 3
[ 2336.015279] cpu 3 spinlock event irq 143
[ 2336.015580] ACPI: _SB
.PR03: Found 3 idle states
[ 2336.015657] CPU3 is up
[ 2336.015659] ACPI: EC: EC started
[ 2336.015784] ACPI: Waking up from system sleep state S3
[ 2336.108923] ACPI: EC: interrupt unblocked
[ 2336.152769] ACPI: EC: event unblocked
[ 2336.171389] nvme nvme0: Shutdown timeout set to 8 seconds
[ 2336.180153] nvme nvme0: 4/0/0 default/read/poll queues
[ 2336.780979] PM: resume devices took 0.629 seconds
[ 2336.780999] acpi LNXPOWER:08: Turning OFF
[ 2336.781050] acpi LNXPOWER:02: Turning OFF
[ 2336.781810] OOM killer enabled.
[ 2336.781812] Restarting tasks … done.
[ 2336.836956] PM: suspend exit

Kind of answering my own question, but disabling hyperthreading happened to be a workaround for the resume from suspend issue.

Hint:

[ 2336.013724] xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU5
[ 2336.013726] xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU6
[ 2336.013727] xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU7
[ 2336.013728] xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU8

Mustafa Kuscu <mustafakuscu@gmail.com>, 26 Ağu 2021 Per, 17:53 tarihinde şunu yazdı:

But shouldn't hyperthreading have already been disabled ever since QSB-043?

1 Like

Thanks for putting effort into publishing the invaluable QSAs. I am now reading them.

Andrew David Wong <adw@qubes-os.org>, 26 Ağu 2021 Per, 21:11 tarihinde şunu yazdı:

Kind of answering my own question, but disabling hyperthreading
happened to
be a workaround for the resume from suspend issue.

But shouldn't hyperthreading have already been disabled ever since QSB-043?

QSB #43: L1 Terminal Fault speculative side channel (XSA-273) | Qubes OS

I admit that I missed that one as well. Shame on me. Is there some way
to detect active hyperthreading on boot && print out a big red warning ?

That seems a reasonable measure, especially for new-comers how cannot
reasonably be asked to read all old QSB's first :slight_smile:

1 Like

I'm confused. I was under the impression that Qubes OS (after the QSB-043 patches) automatically disables hyper-threading for you such that you don't have to know anything, do anything, or read any past QSBs.

As QSB-043 explains, you would have had to follow special instructions to re-enable hyper-threading in Qubes 3.2, and no such instructions were provided for re-enabling it in Qubes 4.0 (since, as the QSB explains, it's never safe in that release), so I don't even know how'd you do it in that release.

But perhaps I'm mistaken or misunderstanding the question.

Ah, a thought just occurred to me. As QSB-043 states, "A CPU
microcode update is required to take advantage of [these patches]." Perhaps the problem is that certain CPUs never received the required microcode updates, which explains why some users seem to have CPUs with hyper-threading enabled even though it's been years since QSB-043. Could that be it?

Of course, it's generally also possible to disable hyper-threading in one's BIOS/EFI settings, regardless of whether it's disabled in Xen, and this does seem like a prudent measure given the risks associated with having it enabled and given the fact that Xen-level disablement appears to be hit-or-miss. So, perhaps your suggestion about detecting and warning about active hyper-threading might be a good idea after all. Please feel free to open an enhancement request.

There are (at least) two ways to disable hyper-threading:
1. In system BIOS (if there is such option)
2. In software - by disabling every second thread of each core.

The QSB-043 uses the second method. It has is drawbacks, as the logic to
bring up and down CPUs is quite complex. And yes, there are known
issues[1] affecting suspend. Disabling hyper-threading in BIOS, prevents
Xen from starting those secondary threads at all, and so it doesn't need
to bring them down.

[1] Resume from suspending is broken after update to Xen 4.14 · Issue #6066 · QubesOS/qubes-issues · GitHub

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[Andrew]

>>>> But shouldn't hyperthreading have already been disabled ever since
>>>> QSB-043?
>>>>
>>>> QSB #43: L1 Terminal Fault speculative side channel (XSA-273) | Qubes OS
>>>>
>>> I admit that I missed that one as well. Shame on me. Is there some way
>>> to detect active hyperthreading on boot && print out a big red
warning ?
>>>
>>> That seems a reasonable measure, especially for new-comers how cannot
>>> reasonably be asked to read all old QSB's first :slight_smile:
>>>

> [ Markek ]
> There are (at least) two ways to disable hyper-threading:
> 1. In system BIOS (if there is such option)
> 2. In software - by disabling every second thread of each core.
>
> The QSB-043 uses the second method. It has is drawbacks, as the logic to
> bring up and down CPUs is quite complex. And yes, there are known
> issues[1] affecting suspend. Disabling hyper-threading in BIOS, prevents
> Xen from starting those secondary threads at all, and so it doesn't need
> to bring them down.
>
> [1]

Thank you Marek. I only now disabled it in BIOS (my fault), and my
question was that software could point a warning to the user in case of
software disabling. I would have done it much faster then :slight_smile:

Bernhard

>>> Marek Marczykowski-Górecki 31.08.2021, 02:52 >>>
> >
> > > > Kind of answering my own question, but disabling hyperthreading
> > > > happened to
> > > > be a workaround for the resume from suspend issue.
> > >
> > > But shouldn't hyperthreading have already been disabled ever since
> > > QSB-043?
> > >
> > > QSB #43: L1 Terminal Fault speculative side channel (XSA-273) | Qubes OS
> > >
> > I admit that I missed that one as well. Shame on me. Is there some way
> > to detect active hyperthreading on boot && print out a big red warning ?
> >
> > That seems a reasonable measure, especially for new-comers how cannot
> > reasonably be asked to read all old QSB's first :slight_smile:
> >
>
> I'm confused. I was under the impression that Qubes OS (after the QSB-043
> patches) automatically disables hyper-threading for you such that you don't
> have to know anything, do anything, or read any past QSBs.
>
> As QSB-043 explains, you would have had to follow special instructions to
> re-enable hyper-threading in Qubes 3.2, and no such instructions were
> provided for re-enabling it in Qubes 4.0 (since, as the QSB explains, it's
> never safe in that release), so I don't even know how'd you do it in that
> release.
>
> But perhaps I'm mistaken or misunderstanding the question.

There are (at least) two ways to disable hyper-threading:
1. In system BIOS (if there is such option)
2. In software - by disabling every second thread of each core.

The QSB-043 uses the second method. It has is drawbacks, as the logic to
bring up and down CPUs is quite complex. And yes, there are known
issues[1] affecting suspend. Disabling hyper-threading in BIOS, prevents
Xen from starting those secondary threads at all, and so it doesn't need
to bring them down.

[1] https://github.com/QubesOS/qubes-issues/issues/6066#issuecomment-901843312

Hi!

Can't it be disabled via kernel (grub) command line, too?

This is exactly "the second method" above.

Also rumours say you can even disable it at runtime (and the threads will be
migrated to other threads before).
Occasionally some tools seem to have problems with HT being disabled (like
"expecting 8 CPUS, but only found 4").

This is kind of similar issue as the one discussed here. That's why it's
better to disable HT in BIOS - to not show those 8 CPUs at all. But from
the OS level, we don't have other choice, and we prefer a secure
default - that's why we disable HT at Xen level, to provide safer option
regardless of what user has set in the BIOS.

PS Please don't top-post. And keep the mailing list in CC.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Couldn't xen/qubes set a boot warning to users that rely only on (2) to
encourage more strongly to disable by BIOS (1)? That seems a logic
measure to me. best, Bernhard