Nvidia GPU Pass-through on Qubes 4.3 (Fedora 43 Template)

Nvidia GPU Pass-through on Qubes 4.3 (Fedora 43 Template)

Author’s Note: This guide addresses specific dependency conflicts involving the grubby-dummy package that break standard Nvidia driver installations in Qubes. It implements a fully automated “native DNF” solution using an improved custom dummy package and DNF5 hooks, eliminating the need for manual wrapper scripts during updates.


:warning: WARNINGS AND DISCLAIMERS

  1. Security: This setup weakens isolation and is not recommended for high-security contexts.
  2. Compatibility: This might not work with all hardware configurations.
  3. Stability: Updates might beak this setup.

Phase 1: Dom0 Configuration

We need to isolate the GPU from Dom0 so it can be passed to a VM.

  1. Identify your GPU PCI IDs:
    Open a terminal in Dom0 and run:

    lspci -nn | grep -i nvidia
    

    Example Output:

    05:00.0 VGA compatible controller [0300]: NVIDIA Corporation… [10de:1e04]
    05:00.1 Audio device [0403]: NVIDIA Corporation… [10de:10f7]
    05:00.2 USB controller…
    05:00.3 Serial bus controller…

  2. Hide the devices from Dom0:
    Edit /etc/default/grub in Dom0. Find the line GRUB_CMDLINE_LINUX and append the following (replace the IDs with your specific ones found above):

    rd.qubes.hide_pci=05:00.0,05:00.1,05:00.2,05:00.3
    
  3. Update Grub and Reboot:

    sudo grub2-mkconfig -o /boot/grub2/grub.cfg
    sudo reboot
    

Phase 2: TemplateVM Preparation

We will create a specialized template for GPU workloads.

  1. Clone the Template:
    Clone your standard Fedora 43 template (e.g., fedora-43-xfce) to fedora-43-xfce-gpu.

  2. Configure Template Settings:
    Open Qube Settings for fedora-43-xfce-gpu:

    • Advanced Tab:
      • Kernel: (provided by qube) (Essential for Nvidia drivers)
      • Virtualization mode: HVM
      • Memory balancing: Uncheck/Disable (PCI pass-through requires fixed memory)
    • Devices Tab:
      • Pass through your Nvidia devices including Audio and USB controllers (if applicable).
  3. Install Build Tools:
    Start the template (or use a temporary Disposable VM) and install the necessary tools:

    sudo dnf install rpm-build rpmrebuild
    

Phase 3: The “Super-Grubby” Fix (Solving Dependency Hell)

The Nvidia driver packages have a strict dependency on /usr/bin/grubby. The standard grubby-dummy package in Qubes does not satisfy this requirement in a way that pleases the DNF5 dependency resolver, causing conflicts with upstream packages (like sdubby or the real grubby).

We solve this by building a “Super-Dummy” package that explicitly provides the binary paths and capabilities required by the driver, preventing DNF from trying to pull in conflicting packages.

  1. Create the Spec File:
    Run this command in the TemplateVM to create the build recipe:

    cat <<EOF > super-grubby.spec
    Name:       grubby-dummy
    Version:    99.0.0
    Release:    2%{?dist}
    Epoch:      1000
    Summary:    Super Dummy for Grubby and Sdubby
    License:    Public Domain
    BuildArch:  noarch
    
    # Claim to provide the packages
    Provides:   grubby = %{version}
    Provides:   sdubby = %{version}
    Provides:   grubby-dummy = %{version}
    
    # Claim to provide the specific binary paths (Virtual Provision)
    Provides:   /usr/bin/grubby
    Provides:   /usr/sbin/grubby
    
    # Block the real packages
    Obsoletes:  grubby < %{version}
    Obsoletes:  sdubby < %{version}
    
    %description
    Dummy package to satisfy Nvidia driver dependencies for /usr/bin/grubby.
    
    %build
    # Nothing to build
    
    %install
    # Create only /usr/bin
    mkdir -p %{buildroot}/usr/bin
    
    # Create the dummy script
    echo '#!/bin/bash' > %{buildroot}/usr/bin/grubby
    echo 'echo "Dummy grubby called - doing nothing."' >> %{buildroot}/usr/bin/grubby
    echo 'exit 0' >> %{buildroot}/usr/bin/grubby
    
    # Make it executable
    chmod +x %{buildroot}/usr/bin/grubby
    
    %files
    /usr/bin/grubby
    
    EOF
    
  2. Build the Package:

    rpmbuild -bb super-grubby.spec
    
  3. Install the Super-Dummy:

    You might first need to remove old grubby-dummy manually.

    sudo rpm -e --nodeps grubby-dummy
    

    This will replace the existing Qubes dummy and prevent DNF from ever trying to install the conflicting package.

    sudo dnf install ~/rpmbuild/RPMS/noarch/grubby-dummy-99.0.0-2.fc43.noarch.rpm -y
    

Phase 4: Automating Updates (DNF5 Hooks)

We need to automate the Dracut configuration and fix the “Split-Brain” issue (where the headless Template crashes if Nvidia EGL is enabled, but the AppVM needs it enabled).

  1. Create the Hook Script:
    Create /usr/local/bin/qubes-nvidia-hook.sh:

    #!/bin/bash
    set -e
    
    # --- Configuration ---
    NVIDIA_EGL="/usr/share/glvnd/egl_vendor.d/10_nvidia.json"
    NVIDIA_EGL_BACKUP="/usr/share/glvnd/egl_vendor.d/10_nvidia.json.enabled"
    DRACUT_CONF="/usr/lib/dracut/dracut.conf.d/99-nvidia-dracut.conf"
    XORG_CONF="/usr/share/X11/xorg.conf.d/nvidia.conf"
    
    echo ">>> [Nvidia-Hook] Starting post-update configuration..."
    
    # 1. Handle EGL Split-Brain (Template vs AppVM)
    # We save a copy of the config for the AppVM, then disable it for the Template
    if [ -f "$NVIDIA_EGL" ]; then
        # Check if the file is NOT empty (meaning it was just replaced by an update)
        if [ -s "$NVIDIA_EGL" ] && [ "$(cat "$NVIDIA_EGL")" != "{}" ]; then
            echo " -> New Nvidia EGL config detected."
            
            # Snapshot the fresh config to the .enabled file for the AppVM to use
            cp -f "$NVIDIA_EGL" "$NVIDIA_EGL_BACKUP"
            echo " -> Updated AppVM backup ($NVIDIA_EGL_BACKUP)."
            
            # Neuter the active config for the Template (prevents crash on shutdown/boot)
            echo "{}" > "$NVIDIA_EGL"
            echo " -> Disabled EGL for Template (wrote empty JSON)."
        else
            echo " -> EGL config already neutralized."
        fi
    fi
    
    # 2. Fix Dracut Config (omit -> add)
    # The update usually resets this file, so we force-patch it every time.
    if [ -f "$DRACUT_CONF" ]; then
        sed -i 's/omit_drivers/add_drivers/g' "$DRACUT_CONF"
        echo " -> Dracut config patched (omit_drivers -> add_drivers)."
    fi
    
    # 3. Remove conflicting Xorg config (Fixes VM crash/hang on shutdown)
    if [ -f "$XORG_CONF" ]; then
        rm -f "$XORG_CONF"
        echo " -> Conflicting Xorg config removed."
    fi
    
    # 4. Regenerate Initramfs
    # CRITICAL: Target the LATEST installed kernel, not necessarily the running one.
    LATEST_KERNEL=$(ls /lib/modules | sort -V | tail -n 1)
    
    if [ -n "$LATEST_KERNEL" ]; then
        echo " -> Regenerating initramfs for kernel: $LATEST_KERNEL"
        dracut -f --kver "$LATEST_KERNEL"
    else
        echo " -> Warning: Could not detect kernel version. Skipping dracut."
    fi
    
    echo ">>> [Nvidia-Hook] Cleanup complete."
    
  2. Make it Executable:

    sudo chmod +x /usr/local/bin/qubes-nvidia-hook.sh
    
  3. Register the DNF5 Action:
    Create /etc/dnf/libdnf5-plugins/actions.d/nvidia-qubes.actions:

    # Trigger the fix script after any transaction involving nvidia packages
    # Syntax: trigger:package_filter:direction:option:command
    post_transaction:*nvidia*:in::/usr/local/bin/qubes-nvidia-hook.sh
    

Phase 5: Install Nvidia Drivers

Now that the infrastructure is in place, installing the drivers is standard.

  1. Install Packages:

    sudo dnf install xorg-x11-drv-nvidia xorg-x11-drv-nvidia-cuda akmod-nvidia kernel-devel
    

    Note: Due to the DNF hook, the initramfs regeneration and config patching will happen automatically at the end of this transaction.

  2. Shutdown the Template:

    sudo poweroff
    

Phase 6: AppVM Configuration

We need an AppVM that has the physical GPU attached and knows how to restore the EGL config that we disabled in the Template.

  1. Create the AppVM:
    Create gpu-personal based on fedora-43-xfce-gpu.

  2. Configure AppVM Settings:

    • Advanced: Kernel (provided by qube), Mode HVM, Memory Balancing Disabled.
    • Devices: Add your Nvidia GPU (VGA, Audio, USB, etc.). Remember to remove card from your template at this point.
  3. Enable Nvidia EGL (The Split-Brain Fix):
    The Template has an empty EGL config (to prevent crashes). We need the AppVM to use the valid backup we created.

    Start the AppVM, open a terminal, and edit /rw/config/rc.local:

    sudo nano /rw/config/rc.local
    

    Add this content:

    #!/bin/bash
    # Restore Nvidia EGL config for GPU pass-through
    if [ -f /usr/share/glvnd/egl_vendor.d/10_nvidia.json.enabled ]; then
        mount --bind /usr/share/glvnd/egl_vendor.d/10_nvidia.json.enabled /usr/share/glvnd/egl_vendor.d/10_nvidia.json
    fi
    
  4. Reboot the AppVM.


Verification & Troubleshooting

After rebooting the AppVM:

  1. Verify Driver Load:
    Open a terminal in the AppVM:

    nvidia-smi
    

    You should see your GPU model and memory usage.

  2. Manual Recovery (If updates fail):
    If the DNF hook ever fails to fire, you can manually trigger the fix in the TemplateVM:

    sudo /usr/local/bin/qubes-nvidia-hook.sh
    
  3. Manual Module Build:
    If nvidia-smi fails, check if the module was built in the Template:

    rpm -qa | grep kmod-nvidia
    # If missing, force rebuild:
    sudo akmods --rebuild --kernels $(uname -r)