Networking Broken in 4.1 Default Templates

Continuing the discussion from Tor > VPN connection issues - only in 4.1 - multiple test configurations & vpn providers:

Please see the 2nd post in this thread for the Template issue description. This first post is only to offer clarification how I came to the conclusion that the real problem is Template related and not VPN related.

I first noticed some odd networking issue back in alpha 4.1 releases and every release I picked up after through at least a month after beta continued to have the same issues.

Primary Issue (which lead to this template problem discover) VPN’s have issues connecting when connected after a TOR (sys-whonix) connection.

It seems to me, from a number of additional testings I did (see linked post above as well as 2nd post in this thread) that the problem has nothing to do with 4.1 specifically and instead entirely lies within the templates that come with 4.1 (or maybe the way they are installed/networked when installed via template dnf download).

I can’t tell you if this affects all templates. But it definitely affects Fedora 33 & 34 & Debian 10 installed from 4.1

I will explain the reason I believe the templates themselves are the issue in the post immediately following this one, first let me just repeat the actual issue from my other thread.

This is what works and does not work in a new 4.1 installation with templates fd33, fd34, deb10 installed (have not tried deb11 yet)

Assuming I have 2 VPN Qubes we will call VPN1 and VPN2 .

Working

NET > FW > VPN1 > AppVM
NET > FW > VPN2 > AppVM
NET > FW > VPN1 > VPN2 > AppVM
NET > FW > VPN2 > VPN1 > AppVM
NET > FW > VPN1 > SYS-WHONIX > AppVM
NET > FW > VPN2 > SYS-WHONIX > AppVM

Not Working

NET > FW > SYS-WHONIX > VPN1 > AppVM
NET > FW > SYS-WHONIX > VPN2 > AppVM
NET > SYS-WHONIX > VPN1 > AppVM
NET > SYS-WHONIX > VPN2 > AppVM

All of the above with identical Qube configs and layouts 100% work on 4.0

Whonix doesn’t seem to be part of the issue, I have a 4.1 Whonix 15 installed and it’s not breaking the networking.

Now, the reason I believe the templates are the actual issue here, whether it be the way they are built, some default networking settings inside them or perhaps something to do with the way 4.1 installs new templates from the repository - any of these could be the cause. But why I think it’s the templates is because:

After much frustration I decided to just copy over my base templates from an older Qubes 4.0 installation of mine. Copied the exact same templates, Fedora 33 & Fedora 34, unmodified, fresh installed using Qubes 4.0 and then “backed up” and restored to my Qubes 4.1.

I then changed my specific VPN VM’s to use those restored templates instead of the built in 4.1 ones and - voila! no more network problems.

Not only did this solve issues with VPN, but guess what, another seemingly unrelated older thread regarding Trezor (crypto hardware wallet) communication between Qubes using some socat commands was also fixed by my simply swapping in these restored 4.0 templates workaround

Trezor & Monero Wallet issues reported for 4.1 - fixed by workaround

Another user with same issues as me (Tor > VPN problem in 4.1)

I also tested this all with fedora 33, fedora 33 minimal & debian 10. Retoring older templates fixes the networking issues in all cases.

I’m not well verse enough in networking protocols in Linux to figured out what exactly might be causing all of this.

Maybe there’s a common theme here aside from it working in 4.0. Is vpn set to UDP instead of TCP?

No, VPN issues are not the problem, they have already been 100% ruled out. The exact same VM transferred back & forth between Qubes 4.0 & 4.1 always works in all scenarios in 4.0 and does not in 4.1 - unless you also copy the TemplateVM’s over from 4.0 as well. And I’m talking fully freshly installed TemplateVM’s in all cases with no changes.

Please see the other thread this one is linked to at the top for further issues attempting to troubleshoot the VPN aspect, before I realized this is not the problem at all and why I started a different topic focused thread.

Working Workaround: For anyone else who is experiencing similar issues as mentioned in my 2nd post, I have a working workaround that I have been using for well over a month now without any issues.

Quite simply, copy TemplateVM’s over from Qubes 4.0.

  • Hop on Qubes 4.0, download a fresh TemplateVM of Fedora 33, 34, Debian 10, 11.
  • Run all the standard updates on that template.
  • Make a backup of it to a USB or wherever.
  • Import them into Qubes 4.1
  • Can run standard updates once again if you wish.
  • Set those restored VM’s as your primary networking VM’s for all your VPN’s or other services which you are having network issues - and voila, issues resolved.

Now, with RC1 there has been some recent new issues installing Template & dom0 updates if you don’t use the built in Templates that you install directly from 4.1 for whatever reason, so make sure you keep, at the very minimum, a Fedora 34 that’s been installed directly from Qubes 4.1 and set that as your main “sys-firewall” base template or else you may not be able to install updates.

Hopefully though, my goal with this thread is that we can resolve whatever problem is in the default 4.1 templates that is causing/blocking certain network processes as mentioned above, TOR > VPN, or other inter-qube communications (like Monero wallets or Trezor bridges) from correctly functioning as they do (and as one would expect them to) in Qubes 4.0

I gave up trying to solve these issues for a while after spending a lot of time on them until recently I did some more thorough testing.

Now that Qubes 4.1 Final is upon us, I would suspect that others may have similar issues in coming months.

Firstly, in the other thread it was asked what VPN’s I have tested and I have tested Mullvad, AirVPN & iVPN - same results with all of them. Again … I can’t see how this is a VPN issue when the problem only exists in Templates installed via 4.1 and not an issue at all when they are installed via 4.0 with identical setup and credentials.

In the past week I have now done some additional testing as follows:
My working templates of Fedora copied over from a 4.0 installation I used the following method:

to update them to to both Fedora 34 and Fedora 35 and used them for my networking and they continue to operate perfectly.

Freshly installed Fedora 34 from the regular repo and Fedora 35 from the testing repo, installed from the 4.1 repo’s, setup identically to the 4.0 versions of the same templates continue to suffer the issue where when connected through Tor first, they initially connected, and then drop the connection and will not remain connected to the VPN (also happens in Debian 11). I have tested this on the Full Templates as well as Minimal Templates (I prefer to use the minimal ones most of the time)

So in order to be clear, I now have:
Fedora-33-Minimal-from-Qubes-4.0 [working normally[
Fedora-33-from-Qubes-4.0 [working normally]
Fedora-34-Minimal-from-Qubes-4.0 (installed fresh) [working normally]
Fedora-34-Minimal-from-Qubes-4.0 (upgraded from 33) [working normally]
Fedora-34-from-Qubes-4.0 [working normally]
Fedora-35-Minimal-from-Qubes-4.0 (upgraded from 33) [working normally]
Fedora-34-Minimal-from-Qubes.4.1 [broken]
Fedora-34-from-Qubes-4.1 [broken]
Fedora-35-Minimal-from-Qubes.4.1 [broken]
Fedora-35-from-Qubes-4.1 [broken]
Debian-11-Minimal-from-Qubes-4.1 [broken]
Debian-11-from-Qubes-4.1 [broken]

I mentioned that it isn’t just VPN’s this is affecting. In 2nd post in this thread I mentioned this also breaks inter-Qube communications for crypto hardware wallet Trezor. In order to use that wallet, it’s “bridge” software must be installed and running in sys-usb, which then communicates on port 21325 with the VM you’re trying to use it in. Setup details are here:

This does not work in any of the TemplateVM’s installed from 4.1 being used as the sys-usb template. Same states as above list. When using copied 4.0 template, it works just fine.

I have setup the configuration for this over a dozen times in recent weeks so it’s definitely not user error not setting up the bridge correctly.

Lastly, another thing that I have noticed, and perhaps this is the ultimate root cause of all of these issues, a default Qubes Template installed in 4.0 and updated to latest updates vs a default 4.1 Template of the same flavor (take Fedora 34 Minimal for example) - the 4.1 version has substantially more pre-installed packages.

I don’t know what all the packages do as I’m no Linux expert but I will provide a list that I outputted for comparison, attached to this post. Two text file exports of all installed packages, with required packages installed to use them as USB & NetVM’s.
q40-f35.txt.log (59.9 KB)
q41-f35.txt.log (102.2 KB)
q40-f34min.txt.log (58.7 KB)
q41-f34min.txt.log (96.8 KB)

Hey

I’m running into the exact same issue, I really appreciate the debugging here.

I think the following ticket is also the same issue. Would It be possible to add your logs and info to the ticket? Not sure how often the devs check the forum

Thanks!

1 Like

Laurel, thank you so for pointing out that ticket!

That’s a lot of additional information that I wasn’t aware of regarding this issue.

I can already see that all my packages are up to date to the ones there that are reported as “fixed” but I will have to try that ARP networking command and see if that fixes the issue (and also if it fixes the issue with Trezor as well).

Not sure if I’ll have time to test this in the next few days but when I do I shall report back.

Those logs I posted above are just the list of packages installed between the two versions and according to that ticket - may not really be related.

I also see that there is 2 additional updates in qubes-vm testing repos with version .32 & .33 of this qubes-core-agent package. I may give those a try and see if that resolves anything as well and report back.

Thanks again, really appreciate this response.

Hey,

Just an FYI that the ARP commands fixed it for me. This was required even with the most up to date qubes-core-agent (of which should have fixed the issue)

I’m back! It’s been a long day of testing all sorts of things with this. I have some semi-good news and more bad news.

Firstly, I’ve tried the ARP command and it did indeed work. That’s really the only good news.

The bad news is that the ARP command does not help fix the other issues with inter-qube communications such as talking to a Trezor device.

As show here: Qubes OS - Trezor Wiki - one needs to setup “socat” rules to have the USB qube communicate with the VM Qube you wish to use it in over a special port.

This is done by adding:
socat - TCP:localhost:21325 to a special file and another rule:
$anyvm $anyvm allow,user=trezord,target=sys-usb in dom0

^ this works perfectly fine with my Qubes 4.0 VM’s set as the template for the sys-usb, it does not work in 4.1 based templates (any of them) - nor does the ARP command fix this issue.

In fact, since a standard sys-usb qube is set to not even have regular ip based routing & networking enabled, ip neigh show & ip r don’t have any results to display (which is not unusual in a sys-usb)

Another thing I tried today which I don’t recommend trying because my results cause it to completely break the qube:

I made a copy of the 4.1 version of Fedora and removed the 4.1 repo from in it, changed it to 4.0 and downgraded all the qubes-core-agent* packages to 4.0 versions. The downgrade when successfully but upon reboot of the Qube, it won’t start correctly and you can not get a terminal.

I went 1 step further to try to remove ALL packages related to qubes-core-agent (to ultimately reinstall the 4.0 versions) but after removing them all, it also removes all necessary components for networking and thus you can’t connect to the repo servers to download the replacement packages.

Anyway, all this to say that - the issue seems to be deeper than a simple “ARP” command problem since the socat communications between Qubes is also not functioning (unless anyone can suggest an ARP adjustment command that might fix that too?)

I’ve also just added a post to the GitHub Issue you linked earlier. Thanks again for that!

1 Like

I use socat under 4.1 for a variety of purposes, and it works fine, so I
cant agree with your conclusion.
It may be that your Trezor setup is not working properly, but in my
experience inter-qube comms work as expected.
You could test this be setting up (e.g) ssh or rsync over qrexec - I’d
be interested to see your results.

I never presume to speak for the Qubes team.
When I comment in the Forum or in the mailing lists I speak for myself.

I’ve never tried rsync between Qubes. I shall test your theory.

Would this be the recommended way to configure it?

I think I’m having socat issues as well. I’m on 4.1 after an in-place upgrade from 4.0. I’m trying to get trezor set up for the first time. I followed the guide Qubes OS - Trezor Wiki exactly.

If I change dom0's /etc/qubes-rpc/policy/trezord-service to $anyvm $anyvm ask,user=trezord,target=sys-usb (ask instead of allow), then I do see rpc popups when I try trezorctl commands in the trezor VM.

With trezord running in sys-usb, I can see

$ sudo netstat -tlp
tcp 0 0 localhost:21325 0.0.0.0:* LISTEN xxxx/trezord
...
# get some output from curl because trezorctl isn't installed here
$ curl localhost:21325
<a href="http://127.0.0.1:21325/status/">Moved Permanently</a>.

trying the same from the trezor VM

$ sudo netstat -tlp
tcp 0 0 0.0.0.0:21325 0.0.0.0:* LISTEN xxxx/socat
$ curl localhost:21325
curl: (52) Empty reply from server

I would expect the curl responses to be the same if socat were working. Is that the right conclusion to draw?

Hi,
I just stumbled over your post. Just put a firewall-vm between your vpn-proxy and sys-whonix and “vpn over tor” will work. I am using a fedora-34 template from a 4.1 install.

2 Likes

Hey @unman, I gave a try your instructions (GitHub - unman/qubes-sync: Simple syncin between qubes over qrexec) for using rsync between VMs with socat and qrexec, and it’s not working either for me in 4.1. After copying the rsync.conf, qubes.Rsync and the systemd files in a template, making an allow policy in dom0, and trying to call rsync, what I get is this

# rsync --port=837 localhost::shared
rsync: server sent "2022/04/24 xx:32:06 socat[1071] E connect(5, AF=2 127.0.0.1:873, 16): Connection refused" rather than greeting
rsync error: error starting client-server protocol (code 5) at main.c(1814) [Receiver=3.2.3]

However, I think this might be because I don’t have an rsync service to enable despite having installed the rsync package. Maybe the instructions there are out of date. The important thing is that I think socat might be working properly because if I run wireshark in the server VM, I can see a SYN and a RST,ACK reply to/from port 837 when I run the rsync command in the client!

This is not the case when I try the trezor setup. No packets appear in the server (sys-usb). I think I might be able to figure out from your git repo how to make the trezor work for me, but I’m not sure I understand how you have it set up. It seems like instead of running socat on the client, you use a systemd socket that listens on 837 and then calls qrexec-client-vm. Then the server doesn’t use the systemd service/socket, and only runs socat to handle the rsync call. I can’t rap my head around how the qubes.Rsync socat call works though.

Also I think I see now that there isn’t any network connection between the qubes. socat just routes the packets through a qrexec call so that each VM sees them as coming from the stdio on localhost, right?

@Anondoe SOLVED!! Now that I figured out what qrexec is doing, it was pretty straightforward debugging to realize that it was just a matter of permissions on the file called by qrexec

sys-usb~$ chmod +x /usr/local/etc/qubes-rpc/trezord-service

That’s it. I guess that wasn’t necessary in earlier versions of qubes, which semes weird in retrospect.

And thanks for the working example @unman :slight_smile:

3 Likes

This did the trick. thank you

Sir @qubes-anon, you are awesome. What an extremely simple fix.

Someone will need to notify Trezor that this needs to be added to their wiki instructions (I’ll get around to it at some point if no one else beats me to it) - Trezor is working perfectly for me now on an 4.1 minimal template as the sys-usb by just chmod +x

As to the inserting firewall vm’s in between each of the VM’s to get them working - not something I have tried yet. Seems like it should not be necessary, but at least it sounds like we have a workaround if that works.

This last comment is in response to this workaround post in my previous thread: