(best) Split SSH setup guide

phl · November 17, 2020, 12:07am

Thank you so much @santorihelix, I really appreciate your help. I am more than happy that random strangers on the Internet jump in and help because they find this thread useful.

As it happens, @whoami started to work on something too and connected with me and @deeplow to fine-tune it. As I didn’t have a look at it yet I’ll leave the stage to the other two. I’m sure your work will be of help. @whoami could tell you more about how you two could cooperate on the document.

whoami · November 17, 2020, 6:32am

Hi @santorihelix, your support here is very welcome.
As @phl mentioned I have also already worked on a final documentation.

I was not able to make a publish on github and then @deeplow told me that there is a on-going new setup for external / community docs. Anyway, I let @deeplow follow up an answer this issue.

I will direct message you and share what I did so far.

helge · December 24, 2020, 2:28pm

After a reinstall of QubesOS I can’t seem to make this work again as I had before.
I did a restore of the VMs so the config there was intact, however the dom0 backup aparently doesn’t contain configuration files.

However based on the steps its just to add the qubes.SshAgent file with “sshclient vault ask”.
When I try to run “ssh-add -L” on the client I’m asked to permit the request, but I don’t get the old notification that client accessing the ssh keys.

“ssh-add -L” on the vault shows me the public keys from keepassxc.
“ssh-add -L” on the client gives “error fetching identities: communication with agent failed” (right after I confirm the “[Dom0] Operation execution”).

From client:
$ ssh-agent
SSH_AUTH_SOCK=/tmp/ssh-bXiKt7QhlPuu/agent.2072; export SSH_AUTH_SOCK;
SSH_AGENT_PID=2073; export SSH_AGENT_PID;

printenv only have one line in regards to ssh:
SSH_AUTH_SOCK=/home/user/.SSH_AGENT_vault

nmap-ncat is installed on the Fedora32 template
Running on Qubes 4.1 (as I did before as well).

Any thoughts?

deeplow · December 24, 2020, 3:13pm

We forgot to update here, but there was a collaboration to produce a full Qubes Split-SSH Guide that is now here:

github.com

Qubes-Community/Contents/blob/master/docs/configuration/split-ssh.md

# Qubes Split SSH

Split SSH implements a concept similar to having a smart card with your private SSH keys, except that the role of the “smart card” is played by another Qubes AppVM. 
This Qubes setup allows you to keep your SSH private keys in a vault VM (`vault`) while using an SSH Client VM (`ssh-client`) to access your remote server. 
This is done by using Qubes's [qrexec][qrexec] framework to connect a local SSH Agent socket from your SSH Client VM to the SSH Agent socket within the vault VM. 
This way the compromise of the domain you use to connect to your remote server does not allow the attacker to automatically also steal all your keys. 
(We should make a rather obvious comment here that the so-often-used passphrases on private keys are pretty meaningless because the attacker can easily set up a simple backdoor which would wait until the user enters the passphrase and steal the key then.)

   ![diagram](https://raw.githubusercontent.com/santorihelix/qubes-splitssh-diagram/main/split-ssh-keepassxc-8.svg)

## Security Benefits

In the setup described in this guide, even an attacker who manages to gain access to the `ssh-client`  VM will not be able to obtain the user’s private key since it is simply not there. 
Rather, the private key remains in the `vault` VM, which is extremely unlikely to be compromised if nothing is ever copied or transferred into it. 
In order to gain access to the vault VM, the attacker would require the use of, e.g., a general Xen VM escape exploit or a signed, compromised package which is already installed in the TemplateVM upon which the vault VM is based.

## Overview

1. Make sure the TemplateVM you plan to use is up to date.
2. Create `vault` and `ssh-client` AppVMs.

This file has been truncated. show original

It’s a result of work by @phl @santorihelix @whoami and I oriented it a bit.

helge · December 24, 2020, 3:31pm

Good it got in to the docs
I see some small changes, but still I’m at the same issue.
If my vault vm isnt running it will get started when I run “ssh-add -L” on the client, but the notification bubble never comes.

phl · December 24, 2020, 5:42pm

Oh, I think that is an issue with the notification daemon in Fedora.
There should be an issue on the Qubes GitHub somewhere (although I think it is not qubes-specific). As far as I remember the daemon crashes if it receives a notification before the GUI subsystem of the VM is fully started. It then never comes online.

I don’t think there is an easy manual fix at the moment.

Edit: See here and related referenced issues.

helge · December 24, 2020, 5:56pm

Ok, but still just getting:
error fetching identities: communication with agent failed
when issuing “ssh-add -L” on the ssh client VM.
Seems like VM communication isn’t working somehow, so strange, this worked before the reinstall

helge · December 25, 2020, 3:03pm

@deeplow @phl
Ok, got it. This procedure doesn’t work on qubes 4.1 stable.
I just recalled that I was on testing repo both for dom0 and templates previously.
Once I changed to testing and updated dom0 and templates all works as expected.

phl · December 25, 2020, 10:36pm

That’s odd. Probably something that’s not (yet) correctly working in 4.1. Thanks for letting us know!

phl · February 9, 2021, 10:43am

Short update from my side: Very recently, my Qubes system has started to behave strangely when connecting to SSH hosts via this method.
Fetching keys from my vault VM now takes ~1 minute, while a direct login to a host (either by password or by key which I copied directly into the AppVM for testing) works as fast as expected (~7 seconds for the whole process).

Clearly, the delay must be introduced by some part of the intra-VM connection.
I have yet to find out which part is flawed and how the behaviour was introduces. I have not modified my configuration for quite some time. This has probably been introduced by some recent Fedora update, as I obviously install these regularly.

@whoami, @santorihelix: Has anyone of you observed similar issues?

UPDATE: And I already found a workaround, but wouldn’t really call it a “solution”: I found that I was still running an older setup of my split-SSH, where I used ncat instead of socat, which we chose for the guide. After changing the setup to socat, everything works as fast as I was used to.
So my guess is that some recent update changed the behaviour of ncat, which introduced these huge lags.

I don’t really have time to debug this further right now. so if anyone has performance issues, the recommendation is:
Use socat instead of ncat!

whoami · February 9, 2021, 11:51am

I will check today and come back to you

9 Feb 2021, 11:53 by qubes_os@discoursemail.com:

whoami · February 9, 2021, 1:45pm

Nope, here all works super smooth. But for me it is just one sec. : select the vault VM in the pop-up and boom I am in.

deeplow · February 10, 2021, 7:54pm

Finally followed this through! It all worked in the end

I was having the same issue, following the process. The solution was to install

fedora: libnotify and mate-notification-daemon
debian: libnotify-bin and mate-notification-daemon

This is the same issue as described here that broke so many people’s VPN configs.

Maybe a note about this should be added to the guide.

phl · February 11, 2021, 2:45pm

That is interesting and definitely unexpected (at least for me). I thought that an error with the notification should have no impact on the rest of the script because it is executed as a separate command, not related to any other line. We should probably try to avoid such errors instead of forcing users to install components which are ‘only comsetic’ (the popup message has no relevance for functionality).

I just found https://stackoverflow.com/questions/11231937 which discusses ways to avoid the termination of a script due to a single failed command. Maybe we could test some of the approaches listed there?

Another idea would be to just change the order of commands in /etc/qubes-rpc/qubes.SshAgent (socat first, notify-send second). Im not sure if this works, though.
Or we could check for existence of the command before exexuting it (see, e.g., https://stackoverflow.com/questions/592620).

deeplow · February 11, 2021, 4:29pm

I guess may brain was in usual programming language mode and didn’t even realize was not to be expected. But yes, in bash by default if something breaks, all the rest should be executed… So I’m not sure why this broke it.

I think for security the non-displaying of a notification should stop the rest of the script from executing. This is because the notification actually notifies the user that something happened. If it doesn’t show the user might not notice something has happened in the background.

phl · February 11, 2021, 7:51pm

Imho the security comes from the qrexec protocol itself which allows only those interactions that one has allowed in dom0 (either by explicitly defining “allow” in the rpc policy file or by interacting with a dom0 prompt if “ask” policy is used). The notify-send popup is just a nice bonus.

phl · February 16, 2021, 8:00am

While reading Restricting a Qube to selected websites I learned about qvm-connect-tcp (The Qubes Firewall | Qubes OS) which is probably what we should be using instead of our own netcat/socat approach.

I probably won’t be able to test this myself until in a few weeks.

mono · June 28, 2021, 4:07pm

On 4.1 this additional packages do not fix the problem.

Followed the nice guide on github, tested with both fedora-33 and debian-10 templates but ssh-add -L only lists keys in he vault, ssh-client AppVM always replies with
“error fetching identities: communication with agent failed”

Any ideas what i might be missing?

Edit: github guide works with stable Qubes 4.05

whoami · June 28, 2021, 4:45pm

Please double chech all your AppVM names. I’d say in 90% of these cases it is a spelling mistake.

Jun 28, 2021, 18:18 by qubes_os@discoursemail.com:

mono · June 28, 2021, 4:51pm

Just set it up with my stable 4.0 Qubes and it works, i doubt i made typos since i copy pasted from guide into template and AppVMs.
Have been using split-ssh for a few years now on Qubes 4.0.x and switched to socat today.
Nevertheless will go one more time from scratch on the Qubes 4.1.