Hello, folks! This is half an announcement, half an RFQ/A regarding a possible backend switch in Qubes shared folders.
Today, the QSF software uses diod (GitHub - chaos/diod: Distributed I/O Daemon - a 9P file server) — a computer program written in C — to serve shared folders from a server VM to any client VMs that have been authorized to access those folders. diod seemed like the only viable alternative at the time, but it has some serious performance limitations and (C being the chief concern) security is a concern as well.
To that effect, virtiofsd (shipped by Fedora, well-maintained, and supported by many distributions) seems like an ideal replacement. The Linux kernel ships a client module that allows VMs to mount virtiofsd shares, and many high-performance high-end virtualization solutions rely on virtiofs in general to share files between hosts and guests. Yes, most of that code is also C but, unlike diod, it seems like that code’s maintained and a lot of coattails ride on the success, performance and security of that entire attack surface.
Ultimately, my advance Fedora 40 builds — all my software is now being built and tested regularly against several distros these days — have bubbled up this issue which seriously concerns me:
[2024-02-23T20:09:42.787Z] In function ‘npc_gets’,
[2024-02-23T20:09:42.787Z] inlined from ‘main’ at diodshowmount.c:136:12:
[2024-02-23T20:09:42.787Z] ../libnpclient/read.c:219:19: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=]
[2024-02-23T20:09:42.787Z] 219 | buf[done] = '\0';
[2024-02-23T20:09:42.787Z] | ^
[2024-02-23T20:09:42.787Z] diodshowmount.c: In function ‘main’:
[2024-02-23T20:09:42.787Z] diodshowmount.c:84:10: note: at offset [-2147483648, -1] into destination object ‘buf’ of size 80
[2024-02-23T20:09:42.787Z] 84 | char buf[80], *host, *p;
[2024-02-23T20:09:42.787Z] | ^
[2024-02-23T20:09:43.043Z] lto1: all warnings being treated as errors
[2024-02-23T20:09:43.297Z] make[2]: *** [/tmp/cchqHbsc.mk:2: /tmp/cccNv7ZR.ltrans0.ltrans.o] Error 1
[2024-02-23T20:09:43.297Z] lto-wrapper: fatal error: make returned 2 exit status
[2024-02-23T20:09:43.297Z] compilation terminated.
[2024-02-23T20:09:43.297Z] /usr/bin/ld: error: lto-wrapper failed
so (if you know anything about security and C) that acts as a dealbreaker to continue using diod as a backend for the near future.
I’d like to know from the users of Qubes shared folders what they think about this possibility.
Thanks in advance for your thoughtful opinions, ideas and suggestions.
I wanted to try to set up the virtiofs solution manually, to see how it behaves, but I am not sure how you could expose a block device from the socket open on the “server” qube using virtiofsd. All solutions I find on the internet are using virtiofsd on the host and creating a block device in the VMs with qemu…
I will be phasing out diod anyway. I have turned my attention to a different Plan 9 FS server which is written in Rust and uses Tokio for asynchrony. I can write 100X times the amount and quality of Rust than I can write C, and Rust is simply eons ahead of C anyway. I already began working on it today to make it function through file descriptors.
I am also motivated to create a battery of conformance tests to ensure data safety too.
One problem I see with 9p is that the protocol has no provision to invalidate caches in the client, which means servers can’t tell clients “this or that inode has changed or this memory segment is no longer valid”, which in turn means it is simply not safe to enable any caching. This sucks because uncached directory listings are slow AF. I need to figure out if there is something I can do about that.
GitHub - google/rust-p9 : Ugh. I have to hold my nose to use code from these freaks, but at least this specific code base works, has tests (even fuzzing, it appears), and uses good practices. The server implementation doesn’t actually have a server executable you can run, so I have written one compatible with file descriptors as communication channels.
I’ve looked at it. IIUC, it uses a protocol from Plan 9, which supposingly supports symlinks etc. This can be cool, but I’d like to avoid symlink support, because:
It adds some attack surface (e.g., symlinks pointing out of the shared directory)
OK, I have server.sh that runs in chroot. Not nice. It leaks some data (usually non-private) like /usr, but that might be a non-threat, especially when sharing between qubes based on the same template. It also doesn’t behave nice when executed repeatedly, as it creates multiple mounts.
#!/bin/bash
# safety settings
set -u
set -e
set -o pipefail
# params
jail=/jail-documents
shared=~/Documents
# Just for debugging
(
printf "%q " server.sh "$@"
echo
) >> /tmp/log
# Create chroot env
sudo mkdir -p $jail{,/{lib64,usr,shared,dev,etc}}
sudo touch $jail/dev/null $jail/etc/passwd $jail/etc/group
for path in lib64 usr dev/null etc/passwd etc/group; do
sudo mount --bind "/$path" "$jail/$path"
done
sudo mount --bind "$shared" "$jail/shared"
# Run SFTP server in chroot
sudo chroot --userspec=user:user $jail \
/usr/libexec/openssh/sftp-server \
-d /shared -P symlink
EDIT: Maybe using sshd and internal-sftp would be a more clear solution. The sshd shouldn’t be directly exposed to client, the interface should remain the same.
Rough comparison of code size: ~1300 LoC vs. ~2100 LoC
Anything else?
It depends on your threat model and scenario. There are two kinds of systems:
server – runs file server plus potentially some other software (Syncthing, document indexing, …)
client(s) – it probably runs some other software that works with the file contents; Note that client can be also outside of QubesOS, just some machine with Syncthing.
There are multiple potential attack scenarios:
Most obvious: attacks on protocol level, just directly between server and client.
Client exploits a vulnerability in server’s software (Syncthing, document indexing, …)
Client or server exploits vulnerability in client’s software. Yes, a client can indirectly attack other client.
So, just the fact that symlinks are followed client-side might not be enough.
SFTP source code is not 2100 LOC. It’s got all the SSH stack below it. Much much more complex than 9pfs. If you don’t understand this, sorry but I have no time to explain it.
If I ran whole SSH server and exposed it, I would agree. But that’s not what I’ve done. I don’t use SSH as transport, although sftp-server is usually (but not in my case) used this way. That’s because the SSH transport (authentication&encryption) isn’t needed for inter-qube communication
Well, I consider using the SSH server in order to be able to use internal-sftp, but even in this case, I wouldn’t expose it. The SSH client and SSH server would run on the same qube, and just the unencrypted SFTP communication (without the SSH transport layer) would be exposed to the client.