Salt: automating NVIDIA GPU passthrough: fedora 40

otter2 · November 5, 2024, 4:46pm

This “guide” aims to explore and give a practical example of leveraging SaltStack to achieve the same goal as NVIDIA GPU passthrough into Linux HVMs for CUDA applications. Salt is a management engine that simplifies configuration, and QubesOS has its own flavour. Want to see some?

This guide assumes that you’re done fiddling with your IOMMU groups and modified grub parameters to allow passthrough.

In addition to that, if you haven’t set up salt environment yet, complete step 1.1 as described in this guide to get ready.

The basics

Before we even start doing anything, let’s discuss the basics. You probably already know that salt configurations are stored in /srv/user_salt/. Here’s how it may look:

.
├── nvidia-driver
│   ├── disable-nouveau.sls
│   ├── init.sls
│   └── map.jinja
├── test.sls
└── top.sls

Let’s start with the obvious. top.sls is a top file. It describes high state, which is really just a combination of convetional salt formulas. Stray piece of salt configuration can be referred to as formula, although I’ve seen this word being used in various contexts. test.sls is a state file. It contains a configuration written in yaml. nvidia-driver is also a state, although it is a directory. This is an alternative way to store state for situations when you want to have multiple state (or not only state) files. When a state directory is referenced, salt evaluates init.sls state file inside. State files may or may not be included from init.sls or other state files.

Yaml configuration consists of states. In this context, state refers to a module - piece of code that most often does a pretty specific thing. In a configuration, states behave like commands or functions and methods of a programming language. One valuable thing to note here is that not all modules are state modules. There are a lot of them, and they can do various things, but here we only need the state kind.

In addition to state files, you notice map.jinja. Jinja is a templating engine. What it means is that it helps you to generalize your state files by adding variables, conditions and other cool features. You can easily recognize jinja by fancy brackets: {{ }}, {% %}, {# #}. This file in particular stores variable definitions and is used for configuration of the whole state directory thingy (nvidia-driver).

Writing salt configuration

1. Create a standalone

First, let’s write a state to describe how vm shall be created:

nvidia-driver--create-qube:
  qvm.vm:
    - name: {{ prefs.standalone_name }}
    - present:
      - template: {{ prefs.template_name }}
      - label: {{ prefs.standalone_label }}
      - mem: {{ prefs.standalone_memory }}
      - vcpus: {{ prefs.standalone_cpus }}
      - maxmem: 0
      - class: StandaloneVM
    - prefs:
      - label: {{ prefs.standalone_label }}
      - mem: {{ prefs.standalone_memory }}
      - vcpus: {{ prefs.standalone_cpus }}
      - pcidevs: {{ devices }}
      - virt_mode: hvm
      - kernel:
      - maxmem: 0
      - class: StandaloneVM
    - features:
      - set:
        - menu-items: qubes-run-terminal.desktop

Here, I use qubes-specific qvm.vm state module (which in reality is a wrapper around other modules, like prefs, features, etc.). Pretty much all values and keys here are the same as you can set and get using qvm-prefs and qvm-features. For nvidia drivers to work, kernel must be provided by the qube - that’s why the field is empty. Similarly, to pass GPU we need to set virtualization mode to hvm and maxmem to 0 (it disables memory balancing).

nvidia-driver--create-qube is just a label. As long as you don’t cross the syntax writing it, it should be fine. Aside from referencing, plenty of modules can use it to simplify the syntax, and some need it to decide what to do, but you can look it up later if you want.

Now, to the jinja statements. Here, they provide values for keys like label, template, name, etc. Some of them are done this way (as opposed to writing a value by hand) because the value is repeated in the state file multiple times, others are to simplify the process of configuration. In order to figure out why some of them use dot notation whereas other don’t, we must check their declaration. In this state file they’re imported using the following line:

{% from 'nvidia-driver/map.jinja' import prefs,devices,paths %}

This is pretty much just python in brackets. Notice that you need to specify state directory when importing, and use actual path instead of dot notation.

Upon inspection of map.jinja, what we see is:

{% set prefs = {
    'standalone_name': 'fedora-40-nvidia',
    'standalone_label': 'yellow',
    'standalone_memory': 4000,
    'standalone_cpus': 4,
    'template_name': 'fedora-40-xfce',
} %}

{# Don't forget to check devices before running! #}
{% set devices = [
    '01:00.0',
    '01:00.1',
] %}

{% set paths = {
    'nvidia_conf': '/usr/share/X11/xorg.conf.d/nvidia.conf',
    'grub_conf': '/etc/default/grub',
    'grub_out': '/boot/grub2/grub.cfg',
} %}

Here, I declare dictionary prefs, list devices, and another dictionary paths. Since we need to pass all devices from the list to new qube, in the state file I reference the whole list using jinja expression ({{ devices }}). Dictionaries are used to fill parameters, and dot notation is used to reference specific values in them.

Double brackets tell the parser to “print” the value into state file before the show starts, whereas statements ({% %}) do logic. {# #} is a comment.

1.5 Interlude: what’s next?

Now, when we have a qube at the ready (you can check it by applying it), how to install drivers? I want to discuss what’s going on next, because at the moment of writing (November 2024) this guide is for fedora 40 in combination with somewhat modern hardware, and it has some distro-specific issues.

Tip: To apply state, put your state into the folder in your salt environment together with jinja file and run
sudo qubesctl --show-output state.sls <name_of_your_state> saltenv=user
(substitute <name_of_your_state>)

Salt will apply the state to all targets. When not specified, dom0 is the only target. This is what we want here, because dom0 handles creation of qubes, but what to do if situation is different? Add --skip-dom0 if you want to skip dom0 and add --targets=<targets> to add something else.

The plan:

Prepare qube ← we’re here
Enable rpmfusion repository
Grow /tmp/, because default 1G is too small to fit everything that driver building process spews out. It will fail otherwise.
Delete grubby-dummy, becase it confilcts with sdubby, and nvidia drivers depend on it.
Install akmod-nvidia and xorg-x11-drv-nvidia-cuda
Wait for building process to finish
Delete X config, because we don’t need it where we going
optional : Disable nouveau, because nvidia install script may fail to convince the system that it should use nvidia driver.

2-0.5. How to choose target inside the state file

Unless you are willing to write (and call for) multiple states to perform single operation, you might be wandering how to make salt apply only first state (qube creation) to dom0, and all others - to the nvidia qube. The answer is to use jinja:

{% if grains['id'] == 'dom0' %}

<!-- Dom0 stuff goes here -->

{% elif grains['id'] == prefs.standalone_name %}

<!-- prefs.standalone_name stuff goes here -->

{% endif %}

That way, state will be applied to all targets (dom0, prefs.standalone_name), but jinja will edit the state file appropriately for each of them.

2. Enable rpmfusion

Pretty self-explanatory. {free,nonfree} is used to enable multiple repositories at once. This is not salt or jinja-specific.

nvidia-driver--enable-repo:
  cmd.run:
    - name: dnf config-manager --enable rpmfusion-{free,nonfree}{,-updates}

3. Extend `/tmp/`

This lasts until reboot. 4G is probably overkill.

nvidia-driver--extend-tmp:
  cmd.run:
    - name: mount -o remount,size=4G /tmp/

4. Delete `grubby-dummy`

nvidia-driver--remove-grubby:
  pkg.purged:
    - pkgs:
      - grubby-dummy

5. Install drivers

Here, I use - require: parameter to wait for other states to apply before installing the drivers. Note that it needs both state (e.g. cmd) and label to function.

nvidia-driver--install:
  pkg.installed:
    - pkgs:
      - akmod-nvidia
      - xorg-x11-drv-nvidia-cuda
      - nvtop
    - require:
      - cmd: nvidia-driver--enable-repo
      - cmd: nvidia-driver--extend-tmp
      - pkg: nvidia-driver--remove-grubby

6. Wait for the drivers to build

Well, this one is kind of wonky. loop.until_no_eval runs the state specified by - name: until it returns stuff from - expected. Here it is set to try once in 20 seconds for 600 seconds. I think it totals 10 minutes. - args: describe what to pass to the state in the - name:

Wonkyness comes from the fact that I run modinfo -F name nvidia, which translates into “What is the name of the module with the name ‘nvidia’?”. It just returns an error until module is present (i.e. done building), and then returns ‘nvidia’.

nvidia-driver--assert-install:
  loop.until_no_eval:
    - name: cmd.run
    - expected: 'nvidia'
    - period: 20
    - timeout: 600
    - args:
      - modinfo -F name nvidia
    - require:
      - pkg: nvidia-driver--install

7. Delete X config

nvidia-driver--remove-conf:
  file.absent:
    - name: {{ paths.nvidia_conf }}
    - require:
      - loop: nvidia-driver--assert-install

8. Disable nouveau

If you download state files, you will find it in a separate file. It is done so for two reasons:

It may not be required
I think vm must be restarted before this change is applied, so first run the main state and apply this after restarting the qube.
- Why? No idea.
- Why don’t just add reboot state into the state file before this? Because only dom0 can reboot qubes, dom0 states are always applied first, and there is no way I know of to make it run part of its state, wait until a condition is met, and continue, without multiple calls to qubesctl, unless…

To run state located inside state folder, use dot notation, e.g.: state.sls nvidia-driver.disable-nouveau

{% from 'nvidia-driver/map.jinja' import prefs,paths %}
{% if grains['id'] == prefs.standalone_name %}

nvidia-driver.disable-nouveau--blacklist-nouveau:
  file.append:
    - name: {{ paths.grub_conf }}
    - text: 'GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX rd.driver.blacklist=nouveau"'

nvidia-driver.disable-nouveau--grub-mkconfig:
  cmd.run:
    - name: grub2-mkconfig -o {{ paths.grub_out }}
    - require:
      - file: nvidia-driver.disable-nouveau--blacklist-nouveau

{% endif %}

Make sure to change the paths if you’re not running fedora 40.

not the ways I know of:

advanced : Check for conditions in grains and pillars, see this topic
why? : also I don’t think it can work

qubesctl2000×1200 324 KB

Want to check out the complete state? Here you go:
disable-nouveau.yaml (486 Bytes)
init.yaml (2.2 KB)
map.yaml (468 Bytes)

Uploading salt is forbidden, therefore files are renamed into .yaml. For now, I only have state for fedora 40, but modifying it for debian or fedora without conflicting dependencies is trivial.

Contributions, improvements and fixes are welcome! I call it GPL-3 if you need that for some reason.

absent · November 6, 2024, 10:27am

The salt explanations are great!Please continue to explain:-)

otter2 · November 8, 2024, 10:49pm

Should salt versions of existing guides be inserted into parent guides or stay in separate posts?

Attach em
Keep separate

0 voters

Churros · November 12, 2024, 2:11pm

is it correct that you do not have this in a github repo?

I would like to add cuda container toolkit and (optional) docker installation with configuration set to use the cuda docker “driver”, which makes it simple to do llm inference in containers

I do this currently with ansible, but I can’t imagine it being that difficult for me to add to your formula

Because I’m not salt competent, maybe what I’m proposing would be better as indirect additions- I don’t yet have the patterns down (grains vs pillars, etc) so any tips on how to best implement it would be appreciated

otter2 · November 15, 2024, 4:53pm

Yeah, unfortunately. I want to upload it to a repository, but now I don’t even have enough time to make a debian version

This makes me curious what’s the advantage of using containers for this task

I’m not an ace myself, and I’m not sure how do you install a cuda container toolkit, so take it with a grain of salt . I think that as long as it doesn’t require rebooting (or any action by dom0 in the middle of a state run by an unprivileged qube) you can include container setup directly in this state. To make it optional, you can utilize a jinja variable. Alternatively, as a separate state it should be easier to operate from a bash script.

Churros · November 16, 2024, 2:19am

Yeah, unfortunately. I want to upload it to a repository, but now I don’t even have enough time to make a debian version

I understand, completely

This makes me curious what’s the advantage of using containers for this task

The only advantage is a management/maintenance one. Rather than install ollama (maybe not a hard task manually, I don’t actually know) one uses docker pull ollama/ollama. No need to “pollute” the system or read the readme

The convenience may be more substantial for other GPU-dependent applications that have more software dependencies. Maybe some Python applications using tensorflow? yes, there is virtualenv, pyvenv, poetry, but it can be nice in a container, knowing exactly which major version of Python will always be available

I’m not an ace myself, and I’m not sure how do you install a cuda container toolkit, so take it with a grain of salt .

It’s not much, add docker apt repo and nvidia cuda container apt repo, apt install, then drop a small json file for docker to reference as settings

I think that as long as it doesn’t require rebooting (or any action by dom0 in the middle of a state run by an unprivileged qube) you can include container setup directly in this state. To make it optional, you can utilize a jinja variable. Alternatively, as a separate state it should be easier to operate from a bash script.

No rebooting, definitely nothing in dom0. Just a dozen or less additional actions to take

Yes making it included but optional with a j2 variable guarding it would be simplest

It may be most correct to include the cuda container package directly in the nvidia salt that you have done but have the docker portion pulled in from a logically separate <salt thingie - formula?> since there are plenty of other uses for docker. And there are probably already many docker salt formulas out there to borrow

Thank you again for this. If I create a repo or extend what you have I will report it here if anyone is interested