Thank you so much @santorihelix, I really appreciate your help. I am more than happy that random strangers on the Internet jump in and help because they find this thread useful.
As it happens, @whoami started to work on something too and connected with me and @deeplow to fine-tune it. As I didn’t have a look at it yet I’ll leave the stage to the other two. I’m sure your work will be of help. @whoami could tell you more about how you two could cooperate on the document.
Hi @santorihelix, your support here is very welcome.
As @phl mentioned I have also already worked on a final documentation.
I was not able to make a publish on github and then @deeplow told me that there is a on-going new setup for external / community docs. Anyway, I let @deeplow follow up an answer this issue.
I will direct message you and share what I did so far.
After a reinstall of QubesOS I can’t seem to make this work again as I had before.
I did a restore of the VMs so the config there was intact, however the dom0 backup aparently doesn’t contain configuration files.
However based on the steps its just to add the qubes.SshAgent file with “sshclient vault ask”.
When I try to run “ssh-add -L” on the client I’m asked to permit the request, but I don’t get the old notification that client accessing the ssh keys.
“ssh-add -L” on the vault shows me the public keys from keepassxc.
“ssh-add -L” on the client gives “error fetching identities: communication with agent failed” (right after I confirm the “[Dom0] Operation execution”).
From client:
$ ssh-agent
SSH_AUTH_SOCK=/tmp/ssh-bXiKt7QhlPuu/agent.2072; export SSH_AUTH_SOCK;
SSH_AGENT_PID=2073; export SSH_AGENT_PID;
printenv only have one line in regards to ssh:
SSH_AUTH_SOCK=/home/user/.SSH_AGENT_vault
nmap-ncat is installed on the Fedora32 template
Running on Qubes 4.1 (as I did before as well).
Good it got in to the docs
I see some small changes, but still I’m at the same issue.
If my vault vm isnt running it will get started when I run “ssh-add -L” on the client, but the notification bubble never comes.
Oh, I think that is an issue with the notification daemon in Fedora.
There should be an issue on the Qubes GitHub somewhere (although I think it is not qubes-specific). As far as I remember the daemon crashes if it receives a notification before the GUI subsystem of the VM is fully started. It then never comes online.
I don’t think there is an easy manual fix at the moment.
Ok, but still just getting:
error fetching identities: communication with agent failed
when issuing “ssh-add -L” on the ssh client VM.
Seems like VM communication isn’t working somehow, so strange, this worked before the reinstall
@deeplow@phl
Ok, got it. This procedure doesn’t work on qubes 4.1 stable.
I just recalled that I was on testing repo both for dom0 and templates previously.
Once I changed to testing and updated dom0 and templates all works as expected.
Short update from my side: Very recently, my Qubes system has started to behave strangely when connecting to SSH hosts via this method.
Fetching keys from my vault VM now takes ~1 minute, while a direct login to a host (either by password or by key which I copied directly into the AppVM for testing) works as fast as expected (~7 seconds for the whole process).
Clearly, the delay must be introduced by some part of the intra-VM connection.
I have yet to find out which part is flawed and how the behaviour was introduces. I have not modified my configuration for quite some time. This has probably been introduced by some recent Fedora update, as I obviously install these regularly.
UPDATE: And I already found a workaround, but wouldn’t really call it a “solution”: I found that I was still running an older setup of my split-SSH, where I used ncat instead of socat, which we chose for the guide. After changing the setup to socat, everything works as fast as I was used to.
So my guess is that some recent update changed the behaviour of ncat, which introduced these huge lags.
I don’t really have time to debug this further right now. so if anyone has performance issues, the recommendation is: Use socat instead of ncat!
That is interesting and definitely unexpected (at least for me). I thought that an error with the notification should have no impact on the rest of the script because it is executed as a separate command, not related to any other line. We should probably try to avoid such errors instead of forcing users to install components which are ‘only comsetic’ (the popup message has no relevance for functionality).
I just found https://stackoverflow.com/questions/11231937 which discusses ways to avoid the termination of a script due to a single failed command. Maybe we could test some of the approaches listed there?
Another idea would be to just change the order of commands in /etc/qubes-rpc/qubes.SshAgent (socat first, notify-send second). Im not sure if this works, though.
Or we could check for existence of the command before exexuting it (see, e.g., https://stackoverflow.com/questions/592620).
I guess may brain was in usual programming language mode and didn’t even realize was not to be expected. But yes, in bash by default if something breaks, all the rest should be executed… So I’m not sure why this broke it.
I think for security the non-displaying of a notification should stop the rest of the script from executing. This is because the notification actually notifies the user that something happened. If it doesn’t show the user might not notice something has happened in the background.
Imho the security comes from the qrexec protocol itself which allows only those interactions that one has allowed in dom0 (either by explicitly defining “allow” in the rpc policy file or by interacting with a dom0 prompt if “ask” policy is used). The notify-send popup is just a nice bonus.
On 4.1 this additional packages do not fix the problem.
Followed the nice guide on github, tested with both fedora-33 and debian-10 templates but ssh-add -L only lists keys in he vault, ssh-client AppVM always replies with
“error fetching identities: communication with agent failed”
Just set it up with my stable 4.0 Qubes and it works, i doubt i made typos since i copy pasted from guide into template and AppVMs.
Have been using split-ssh for a few years now on Qubes 4.0.x and switched to socat today.
Nevertheless will go one more time from scratch on the Qubes 4.1.