Finding root cause of freeze

Hi all,

When I unplug my USB C dock, Qubes OS freezes and then reboots. I have tried finding more info by using journalctl and dmesg, but it seems not to be updated with what is happening right before the hangup. I have also tried with dmesg -W to try and see if I see some info before the hang and reboot, but there are no entries added between me disconnecting the dock and the reboot. (From the reboot on it logs everything normally)

Anybody have any ideas what I could do more to debug?

Btw, I have noticed that the freeze doesnt happen if I first boot up and then connect the dock.

Some additional information would help. Such as Dock model and Laptop model. USB-C comes in many different favours, lanes, speeds, revisions and alternate modes.

That makes sense! I will share that later, I dont have that information at the moment.

However, I was hoping people might point me into directions of alternative diagniostics tools when the aforementioned dont give any info. Would you happen to know any?

3 Likes

Thanks! I wasnt aware of this feature, will try it out!

You can also enable some additional logging for xen and kernel:

1 Like

I finally got my hands on a proper debug cable, and it logged the crash! (see below)

@alimirjamali, excuse me for never replying with the model of my laptop and dock. My computer is a Dell Latitude 5500 P80F001, and the USB dock is this dock from HP.

I started the error output sudo dmesg -W, and after a few seconds I unplugged the USB dock. The first messages in the crash log appear right after unplugging the USB dock.

In the crash log I see that dom0 is getting a null pointer dereference. What would be a sensible next step? Open an issue in GitHub?

Crash log:

[user@dom0 ~]$ sudo dmesg -W
[  783.329612] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  783.330130] #PF: supervisor read access in kernel mode
[  783.330456] #PF: error_code(0x0000) - not-present page
[  783.330713] PGD 0 P4D 0 
[  783.330844] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  783.331067] CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.9.2-1.qubes.fc37.x86_64 #1
[  783.331453] Hardware name: Dell Inc. Latitude 5500/0M14W7, BIOS 1.13.0 10/06/2021
[  783.331831] Workqueue: events ucsi_handle_connector_change [typec_ucsi]
[  783.332174] RIP: e030:strlen+0x4/0x30
[  783.332363] Code: f7 75 ec 31 c0 c3 cc cc cc cc 48 89 f8 c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <80> 3f 00 74 14 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 cc
[  783.333279] RSP: e02b:ffffc90040077da0 EFLAGS: 00010246
[  783.333748] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000023c000
[  783.334111] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  783.334511] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  783.334944] R10: ffff888100931f10 R11: 0000000000000000 R12: 0000000000000000
[  783.335386] R13: 0000000000000000 R14: ffff8881092cd000 R15: 0000000000000000
[  783.335807] FS:  0000000000000000(0000) GS:ffff888188200000(0000) knlGS:0000000000000000
[  783.336272] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[  783.336622] CR2: 0000000000000000 CR3: 000000013c5b0000 CR4: 0000000000050660
[  783.337040] Call Trace:
[  783.337211]  <TASK>
[  783.337361]  ? __die+0x23/0x70
[  783.337549]  ? page_fault_oops+0x95/0x190
[  783.337771]  ? exc_page_fault+0x76/0x170
[  783.338012]  ? asm_exc_page_fault+0x26/0x30
[  783.338267]  ? strlen+0x4/0x30
[  783.338447]  kernfs_name_hash+0x12/0x80
[  783.338684]  kernfs_find_ns+0x35/0xc0
[  783.338912]  kernfs_remove_by_name_ns+0x4a/0xc0
[  783.339185]  typec_unregister_partner+0x4c/0xe0 [typec]
[  783.339456]  ucsi_unregister_partner+0x103/0x140 [typec_ucsi]
[  783.339749]  ucsi_handle_connector_change+0x310/0x390 [typec_ucsi]
[  783.340121]  process_one_work+0x18b/0x3b0
[  783.340372]  worker_thread+0x277/0x390
[  783.340607]  ? __pfx_worker_thread+0x10/0x10
[  783.340871]  kthread+0xcf/0x100
[  783.341057]  ? __pfx_kthread+0x10/0x10
[  783.341292]  ret_from_fork+0x31/0x50
[  783.341520]  ? __pfx_kthread+0x10/0x10
[  783.341732]  ret_from_fork_asm+0x1a/0x30
[  783.341977]  </TASK>
[  783.342112] Modules linked in: snd_seq_dummy snd_hrtimer vfat fat snd_sof_pci_intel_cnl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils soundwire_generic_allocation soundwire_bus snd_hda_codec_hdmi snd_soc_avs spi_nor mtd mei_hdcp snd_soc_hda_codec mei_pxp mei_wdt snd_soc_skl snd_ctl_led ledtrig_audio snd_soc_hdac_hda dell_rbtn iTCO_wdt intel_pmc_bxt snd_hda_ext_core iTCO_vendor_support snd_soc_sst_ipc ee1004 snd_soc_sst_dsp snd_soc_acpi_intel_match snd_hda_codec_realtek snd_soc_acpi dell_laptop snd_hda_codec_generic snd_soc_core snd_compress snd_hda_scodec_component ac97_bus snd_pcm_dmaengine intel_rapl_msr dell_smm_hwmon intel_uncore_frequency_common snd_hda_intel snd_intel_dspcfg dell_wmi snd_intel_sdw_acpi intel_powerclamp snd_hda_codec snd_hda_core dell_smbios snd_hwdep dcdbas snd_seq snd_seq_device pcspkr dell_wmi_descriptor dell_wmi_sysman snd_pcm firmware_attributes_class wmi_bmof intel_wmi_thunderbolt
[  783.342156]  snd_timer e1000e snd i2c_i801 spi_intel_pci i2c_smbus soundcore spi_intel mei_me joydev mei iwlwifi idma64 processor_thermal_device_pci_legacy processor_thermal_device processor_thermal_wt_hint int3403_thermal dell_smo8800 processor_thermal_rfim cfg80211 processor_thermal_rapl intel_rapl_common intel_pmc_core int3400_thermal processor_thermal_wt_req intel_hid intel_vsec processor_thermal_power_floor pmt_telemetry acpi_thermal_rel rfkill pmt_class sparse_keymap processor_thermal_mbox int340x_thermal_zone intel_pch_thermal intel_soc_dts_iosf loop fuse xenfs dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt typec_displayport i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel rtsx_pci_sdmmc sha512_ssse3 hid_multitouch mmc_core i2c_algo_bit nvme sha256_ssse3 drm_buddy sha1_ssse3 video nvme_core serio_raw ucsi_acpi i2c_hid_acpi nvme_auth rtsx_pci i2c_hid typec_ucsi xhci_pci pinctrl_cannonlake xhci_pci_renesas typec wmi ttm xhci_hcd drm_display_helper cec
[  783.347269]  xen_acpi_processor xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn scsi_dh_rdac scsi_dh_emc scsi_dh_alua uinput dm_multipath
[  783.353236] CR2: 0000000000000000
[  783.353438] ---[ end trace 0000000000000000 ]---
[  783.353716] RIP: e030:strlen+0x4/0x30
[  783.353952] Code: f7 75 ec 31 c0 c3 cc cc cc cc 48 89 f8 c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <80> 3f 00 74 14 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 cc
[  783.354994] RSP: e02b:ffffc90040077da0 EFLAGS: 00010246
[  783.355300] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000023c000
[  783.355716] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  783.356133] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  783.356571] R10: ffff888100931f10 R11: 0000000000000000 R12: 0000000000000000
[  783.356987] R13: 0000000000000000 R14: ffff8881092cd000 R15: 0000000000000000
[  783.357408] FS:  0000000000000000(0000) GS:ffff888188200000(0000) knlGS:0000000000000000
[  783.357897] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[  783.358224] CR2: 0000000000000000 CR3: 000000013c5b0000 CR4: 0000000000050660
[  783.358643] Kernel panic - not syncing: Fatal exception
[  783.358987] Kernel Offset: disabled
(XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
(XEN) ----[ Xen-4.17.4  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<000000000000100d>] 000000000000100d
(XEN) RFLAGS: 0000000000010a86   CONTEXT: hypervisor
(XEN) rax: 0000000086b22018   rbx: 0000000000000000   rcx: 0000000000000000
(XEN) rdx: 000000000000000f   rsi: 0000000000000000   rdi: 0000000000000000
(XEN) rbp: ffff830864ce7c40   rsp: ffff830864ce7bc8   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: 0000000000000836   r11: 0000000000000835
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: ffff830000000000
(XEN) r15: 00000000000000fb   cr0: 0000000080050033   cr4: 00000000003526e0
(XEN) cr3: 0000000864cae000   cr2: 0000000086b22018
(XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <000000000000100d> (000000000000100d):
(XEN)  00 00 00 15 00 00 00 0c <00> 00 00 00 00 00 71 9f 3a 9f 3a 01 00 00 00 be
(XEN) Xen stack trace from rsp=ffff830864ce7bc8:
(XEN)    000000007ab83b13 0000000000000000 ffff830860a00000 ffff830864cc1001
(XEN)    0000000000000001 0000000000000046 ffff830864cc10c4 0000000000000000
(XEN)    0000000000000077 0000000000000206 00000000000400f7 0000000000000000
(XEN)    ffff830864ce7c80 000000068b2bd000 ffff830864ce7c80 0000000000000000
(XEN)    ffff830864ce7c80 ffff830864ce7cb8 ffff82d04028656c 0000000000000000
(XEN)    0000000000000000 0000000864cae000 ffff82d040286546 000000068b2bd000
(XEN)    0000000000000000 0000000000000046 ffff82d040313d36 ffff82d040313e35
(XEN)    0000000000000000 0000000000000065 0000000000000000 ffff82d0403134c7
(XEN)    0000138800000b8f 000083068b152000 0000000000000000 ffff82d040599d80
(XEN)    0000000000000000 ffff830864ce7d88 0000000000000000 ffff82d040599d80
(XEN)    ffff82d0403135bb ffff82d0402275b3 00000000000000fb 0000000080000000
(XEN)    ffff82d0402fd5b8 0000000000000001 0000000000000000 ffff83086099f290
(XEN)    ffff83068b187d70 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffff830864ce7fff 0000000000000000 ffff82d040201916
(XEN)    0000000000000000 ffff82d040599d80 000000c1cfbd8fdb ffff82d0405944a0
(XEN)    ffff830864caa010 ffff82d0405944a0 ffff82d0405a2ba0 0000000000000000
(XEN)    0000000000000062 ffff82d04043bb80 000000000927a6df 0000000000000008
(XEN)    00000000000001cf ffff82d040595120 ffff830864caa010 000000fb00000000
(XEN)    ffff82d040317d59 000000000000e008 0000000000000282 ffff830864ce7e30
(XEN)    0000000000000000 ffff82d040275336 0000000000000000 ffff830864ce7fff
(XEN) Xen call trace:
(XEN)    [<000000000000100d>] R 000000000000100d
(XEN)    [<000000007ab83b13>] S 000000007ab83b13
(XEN)    [<ffff82d04028656c>] S efi_reset_system+0x4c/0x90
(XEN)    [<ffff82d040286546>] S efi_reset_system+0x26/0x90
(XEN)    [<ffff82d040313d36>] S __stop_this_cpu+0x16/0x30
(XEN)    [<ffff82d040313e35>] S smp_send_stop+0xc5/0xe0
(XEN)    [<ffff82d0403134c7>] S machine_restart+0x247/0x330
(XEN)    [<ffff82d0403135bb>] S shutdown.c#__machine_restart+0xb/0x10
(XEN)    [<ffff82d0402275b3>] S smp_call_function_interrupt+0x73/0x90
(XEN)    [<ffff82d0402fd5b8>] S do_IRQ+0x288/0x5c0
(XEN)    [<ffff82d040201916>] S common_interrupt+0x136/0x150
(XEN)    [<ffff82d040317d59>] S get_s_time+0x19/0x50
(XEN)    [<ffff82d040275336>] S cpuidle_menu.c#menu_select+0x46/0x240
(XEN)    [<ffff82d040282045>] S mwait-idle.c#mwait_idle+0x65/0x370
(XEN)    [<ffff82d0402ed8c1>] S domain.c#idle_loop+0xc1/0x120
(XEN)    [<ffff82d0402ed800>] S domain.c#idle_loop+0/0x120
(XEN)    [<ffff82d0402ef648>] S context_switch+0x168/0x940
(XEN) 
(XEN) Pagetable walk from 0000000086b22018:
(XEN)  L4[0x000] = 0000000864cad063 ffffffffffffffff
(XEN)  L3[0x002] = 0000000000000000 ffffffffffffffff
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0002]
(XEN) Faulting linear address: 0000000086b22018
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

FATAL: read zero bytes from port
term_exitfunc: reset failed for dev UNKNOWN: Input/output error

Did you try older dom0 kernel versions 6.1/6.6?
Maybe this issue is only present in newer kernel versions.
Do you have anything connected yo your dock when you disconnect it?

Try to boot from Fedora Live USB (and with the same kernel version as in your dom0 or check the same kernel version that you have in Fedora Live USB in your dom0 later) and disconnect your dock there to check if this is an issue specific to Qubes OS or if its a general issue with Linux kernel.
If you’ll have the same issue in Fedora then I guess it’s an issue with kernel and it should be reported to linux kernel mailing list.

I did not yet try that, good idea!

To the USB dock is another dock connected (the one integrated in the monitors I use). Also, I have an HDMI connected to the (first) USB dock.

I’ll try with other dom0 kernels, and try with liveboot fedora. Will update here, thanks again!

So it’s like this?

                             (USB-C)
| laptop | <---> | dock1 | <---------> | dock2 in your external monitor1 |
                    /\
                    || (HDMI)
                    \/
                | monitor2 |                      

Do you have the same freeze issue if you disconnect everything from dock1 and then unplug dock1 from your laptop?

Yes, that’s exactly how the setup is. I’ll give that suggestion a try first.

Yeah it crashes again with only dock1 connected without everything else. Will try with same fedora (Fedora 37 on kernel 6.9.2-1).

You can also try to boot into dom0 OS without Xen to rule out an issue with Xen:

Thanks that sounds much easier. Was already in the rabbit hole of building a live Fedora 37 CD with a specific kernel.

You can just try latest Fedora 40 Live image and then try the kernel version that will be there in dom0.

Interesting, booting straight into dom0 without xen as you specified, does not result in any crash when unplugging the dock.

1 Like

Then it should be related to Xen.
I guess you can report it in Qubes OS github issue and maybe report it to Xen as well.

Pardon my ignorance, but how do I easily try a different kernel in a live boot fedora?

No, I mean you can try the Fedora Live as is and if it’ll have e.g. kernel 6.8.8 then you can try kernel 6.8.8 in dom0.

Ah I see, that makes sense