Having just successfully gotten a crash dump out of dom0 on a computer with no serial ports, I figured my findings may be beneficial to others, so here’s a write-up. Thanks in advance to several users on the Qubes Matrix chat, especially Marmarek and Royger, without whom I would not have managed to figure this out.
Why you want this: If you have Xen itself, or dom0 crashing for any reason, typically a hardware compatibility issue, you have a problem. Xen doesn’t have any means of updating the screen in the event of a crash, so this typically means the computer just… stops. In my case it would reboot a few seconds later. The entire mass storage of a Qubes machine is encrypted, so there’s nowhere for any kernels to write out crash dumps or crash traces. If you’re on a desktop platform with PCIe expansion, you can get a PCIe serial interface card and get debugging data out of Qubes. But if you’re on a modern laptop, you’re kinda hosed.
Xen can’t use USB serial, it needs “real” hardware 8250-family serial ports on a bus that is enumerated at bootloader time, which means PCI/PCIe/LPC(ISA). Some modern laptops have a USB3 debugging feature, but mine isn’t one of them.
Prerequisites - You’re going to need a USB4/Thunderbolt PCIe bridge - these are typically sold as “eGPU Docks”. A true USB4 NVMe caddy might be suitable, with the right adapters, but I used an eGPU dock for this exercise. As Qubes won’t/can’t configure such a thing, it needs BIOS support - we need the BIOS to configure it at boot-time before handing off to GRUB, which I suspect may rule out most desktop-class machines with Thunderbolt add-in cards. You’re going to need a PCIe serial card to install in the dock. Marmarek recommends a card based on the ASIX AX99100 chipset, but others may also work.
If you have no idea how rs232 serial works, what a null-modem cable is, or how unix-like commandlines work, this project may not be fruitful for you. This writeup assumes you already have a debug host with a working rs232 interface and suitable cabling. My debug host is a Windows machine with an FTDI USB serial adapter and PuTTY. I strongly recommend writing out SystemRescueCD onto a spare thumb-drive, as it can be used to validate your hardware actually works without all the complexity of dealing with Qubes’ security model.
As a warning, many of the system changes I’m about to discuss compromise Qubes’ security model. You may want to consider using a spare drive with a fresh Qubes install and no confidential data. If you’re using a production Qubes machine, please undo these changes after you have the debug data you need.
Setup your debug host, connect it to the serial card in the PCIe dock, and connect the dock to the laptop. Then boot to a SystemRescueCD stick, and edit the bootloader options. You need linux to leave the Thunderbolt alone, hopefully in it’s already-configured state, so add thunderbolt.host_reset=0
to the kernel options (That’s the line starting with the word ‘linux’). Once you’re booted, run an lspci -tv
and examine the output. Mine looks like this:
-[0000:00]-+-00.0 Advanced Micro Devices, Inc. [AMD] Device 14e8
+-00.2 Advanced Micro Devices, Inc. [AMD] Device 14e9
+-01.0 Advanced Micro Devices, Inc. [AMD] Device 14ea
+-02.0 Advanced Micro Devices, Inc. [AMD] Device 14ea
+-02.1-[01]----00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
+-02.2-[02]----00.0 Qualcomm Technologies, Inc QCNFA765 Wireless Network Adapter
+-02.4-[03]----00.0 Samsung Electronics Co Ltd Device a80d
+-03.0 Advanced Micro Devices, Inc. [AMD] Device 14ea
+-04.0 Advanced Micro Devices, Inc. [AMD] Device 14ea
+-04.1-[04-63]----00.0-[05-06]----00.0-[06]--+-00.0 MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller
| +-00.1 MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller
| \-00.2 MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller
+-08.0 Advanced Micro Devices, Inc. [AMD] Device 14ea
+-08.1-[64]--+-00.0 Advanced Micro Devices, Inc. [AMD/ATI] Phoenix1
| +-00.1 Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt Radeon High Definition Audio Controller
| +-00.2 Advanced Micro Devices, Inc. [AMD] Family 19h (Model 74h) CCP/PSP 3.0 Device
| +-00.3 Advanced Micro Devices, Inc. [AMD] Device 15b9
| +-00.4 Advanced Micro Devices, Inc. [AMD] Device 15ba
| +-00.5 Advanced Micro Devices, Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor
| \-00.6 Advanced Micro Devices, Inc. [AMD] Family 17h/19h HD Audio Controller
+-08.2-[65]--+-00.0 Advanced Micro Devices, Inc. [AMD] Device 14ec
| \-00.1 Advanced Micro Devices, Inc. [AMD] AMD IPU Device
+-08.3-[66]--+-00.0 Advanced Micro Devices, Inc. [AMD] Device 14ec
| +-00.3 Advanced Micro Devices, Inc. [AMD] Device 15c0
| +-00.4 Advanced Micro Devices, Inc. [AMD] Device 15c1
| \-00.6 Advanced Micro Devices, Inc. [AMD] Pink Sardine USB4/Thunderbolt NHI controller #2
+-14.0 Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
+-14.3 Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
+-18.0 Advanced Micro Devices, Inc. [AMD] Device 14f0
+-18.1 Advanced Micro Devices, Inc. [AMD] Device 14f1
+-18.2 Advanced Micro Devices, Inc. [AMD] Device 14f2
+-18.3 Advanced Micro Devices, Inc. [AMD] Device 14f3
+-18.4 Advanced Micro Devices, Inc. [AMD] Device 14f4
+-18.5 Advanced Micro Devices, Inc. [AMD] Device 14f5
+-18.6 Advanced Micro Devices, Inc. [AMD] Device 14f6
\-18.7 Advanced Micro Devices, Inc. [AMD] Device 14f7
You see that really weird clump of devices behind like 5 bridges? That’s Thunderbolt and my serial-io card. This particular card contains two serial ports and a parallel port, though only the “second” serial port is a thing that actually exists - the pads on the board for the first one are unpopulated.
Since it’s hard to figure out the actual bus IDs from the tree view, let’s narrow it down a bit - lspci | grep 9912
06:00.0 Serial controller: MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller
06:00.1 Serial controller: MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller
06:00.2 Parallel controller: MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller
We can ignore the parallel port, meaning the port we’re actually plugged into is either 06:00.0
or 06:00.1
If you only have one serial port on your card, this is easier for you.
Let’s figure out the Bus Address → tty mapping - dmesg | grep tty
[ 0.076707] printk: console [tty0] enabled
[ 0.695951] 0000:06:00.0: ttyS4 at I/O 0x2018 (irq = 42, base_baud = 115200) is a 16550A
[ 0.696562] 0000:06:00.1: ttyS5 at I/O 0x2010 (irq = 44, base_baud = 115200) is a 16550A
[ 17.522733] systemd[1]: Created slice Slice /system/getty.
you’ll notice those bus addresses look familiar, so we’re certain we’re looking at the right things. Initialize the ports with values that match your debug host. In my case, that’s stty -F /dev/ttyS4 115200 sane
and stty -F /dev/ttyS5 115200 sane
.
Now let’s send some text. echo testing ttyS4 > /dev/ttyS4
and echo testing ttyS5 > /dev/ttyS5
. You should see the text appear on your debug terminal. Write down the tty device, bus address, IO address, and IRQ of the port you’re actually on. (In my case, that’s ttyS5
, 06:00.1
, 0x2010
, and 44
) You’re probably only going to need the bus address, but it’s not a bad idea to write down the rest.
We’ve proven the ports work, we’ve proven our serial connection to our terminal works, and we know how to identify the serial port we want to use. Reboot into Qubes. It’s time to start breaking the security model.
Open a dom0 terminal of your choosing (I use Xfce Terminal
), and run sudo su
- Everything we’re doing is as root, so this saves typing sudo
over and over again. It’s time to introduce the only file we’re going to be making changes to - /etc/default/grub
. Making a mistake with this file may render the machine unbootable, so be careful and keep a backup handy.
Here’s my stock /etc/default/grub
- My laptop needs a few options that your installation might not have, but it should look very similar to yours
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=false
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="rd.luks.uuid=luks-06fce1e8-6ffa-4cc0-84e6-46dbc82e2945 rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles module_blacklist=ucsi_acpi 6.6.48-1.qubes.fc37.x86_64 x86_64 rhgb quiet"
GRUB_DISABLE_RECOVERY="true"
GRUB_THEME="/boot/grub2/themes/qubes/theme.txt"
GRUB_CMDLINE_XEN_DEFAULT="console=none dom0_mem=min:1024M dom0_mem=max:4096M ucode=scan smt=off gnttab_max_frames=2048 gnttab_max_maptrack_frames=4096 ioapic_ack=new"
GRUB_DISABLE_OS_PROBER="true"
GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX rd.qubes.hide_all_usb"
I’m going to describe each change we’re making to this file, and how to test that the change was successful. If you want to skip ahead and just apply all the changes without doing the testing, that’s on you. After each change be sure to run grub2-mkconfig -o /boot/grub2/grub.cfg
We need to attach the Thunderbolt USB port to dom0, and the most painless (and also most insecure) way to do this is to launch Qube Manager and disable Autostart on sys_usb. Then edit /etc/default/grub
and remove rd.qubes.hide_all_usb
from GRUB_CMDLINE_LINUX
near the bottom of the file.
Then we’re going to make the same change as SystemRescueCD - we need dom0 to leave the Thunderbolt controller alone.
To the GRUB_CMDLINE_LINUX
at the very bottom of the file add thunderbolt.host_reset=0
.
Shutdown the machine, plug your dock back in if it wasn’t already, and fire up Qubes again.
You should now be able to run lspci -tv
and see your thunderbolt bridge and serial-io board, just like you did under SystemRescueCD.
Initialize the port stty -F /dev/ttyS5 115200 sane
and send a test message echo hello world! > /dev/ttyS5
- You should see this on your debug terminal.
Your serial port is present and working in dom0 - now we need to make it work in Xen instead, and get Xen outputting debug data on it.
From the GRUB_CMDLINE_LINUX
in the middle of the file, remove quiet
, add console=hvc0
and earlyprintk=xen
- This will make dom0 output debug information to it’s console, which Xen will repeat out the serial port - this way you’ll see debug information for both Xen and Dom0 linux.
From GRUB_CMDLINE_XEN_DEFAULT
remove console=none
, and add console=com1
Now we need to tell Xen what com1 is. To GRUB_CMDLINE_XEN_DEFAULT
add com1=115200,8n1,pci,msi,06:00.1
tweaking as necessary. msi
might need to be replaced with either zero or the irq value you’ll find in dmesg | grep tty
.
Now tell Xen we want lots of verbose logging please. To GRUB_CMDLINE_XEN_DEFAULT
add loglvl=all guest_loglvl=all sync_console console_to_ring
- Feel free to tweak this to taste, especially if you don’t want the system slowdown of sync_console.
Let’s reboot and see if this works. You should get tons of debug output on your terminal now, but as dom0 gets off the ground you’ll find there’s a problem - The same problem a desktop computer user would have with a PCIe serial card, I suspect.
You’ll note that your console output stops shortly after Dom0 starts. If you’re not using the first serial port, you’ll notice your output stops immediately after Dom0 initializes the serial port just before the one you’re using. This should give you a hint what’s going on.
Xen doesn’t seem to protect the console serial port from Dom0, so Dom0 attaches a serial port driver and re-initializes the port, which Xen isn’t expecting, so this breaks the whole thing.
As the serial port drivers are built directly into the kernel, we can’t blacklist them via the usual module_blacklist
mechanism. We can however blacklist their function entry points. If you want to see how the entry points work, add initcall_debug
to GRUB_CMDLINE_LINUX
. To fix your serial port, we need to blacklist the two drivers that each will claim it if given the chance: add initcall_blacklist=serial8250_init,serial_pci_driver_init
to GRUB_CMDLINE_LINUX
.
Enjoy your new serial console! If Dom0 blows up, you’ll get the debug trace. I haven’t seen Xen blow up yet, but the same should be true there.