Tor > VPN connection issues - only in 4.1 - multiple test configurations & vpn providers

I have been troubleshooting this problem for a solid week, many many hours, tried many things and can’t seem to determine what is wrong here. Am hoping maybe someone else can at least run a test on their own Qubes 4.1 with a similar setup to see if they have the same issue.

For at least a year now I have been running a sys-net → sys-firewall → sys-whonix → sys-vpn → AppVM setup entirely successfully without issue on Qubes 4.0.3 & 4.0.4.

My ProxyVM/VPN Qubes are setup using the community vpn guide CLI method with the anti-leak script. As stated, they have always worked. I have even done a net>firewall>vpn1>tor>vpn2>appvm setup on numerous occasions which also works perfectly fine.

But as is right now, I’ve been running Qubes 4.1 for over a month and not once have I been able to get either of the above setups to work correctly (or … at all).

I have installed multiple different ISO’s of Qubes, numerous of the weekly Alpha versions kindly provided by @fepitre. I have attempted this setup on his 06-12 release, 07-03, 07-31, and finally - currently I have been trying it on official “Beta1” all day today.

Here are the steps I have taken to reproduce this problem multiple times:

After fresh/initial install of Qubes, create new AppVM, download current config files for OpenVPN (TCP 443 versions for using through TOR) - test the connection to VPN over clearnet (sys-net > sys-firewall > sys-vpn) - this is without having added any of the fancy scripts from the community VPN instructions.

VPN works perfectly fine in this scenario. Then set it up as a “provides network” ProxyVM, create another new AppVM, make the previous proxyVM it’s network, test connection from there - works perfectly fine. So now I know I definitely have a working VPN Qube where VPN tests inside the qube itself as well as using it as a proxyVM on another Qube both work. :+1:

Shut down the qubes. Set their networking to sys-whonix now instead of sys-firewall. Start up sys-whonix, verify whonix connection with Nym. Verify whonix connection again with DispVM based on fedora-33-dvm (also tested with fedora-34 & fedora-32 dvm’s). Whonix gateway & sys-whonix both working perfectly fine.

I then start up the ProxyVM which is now using sys-whonix, from in the ProxyVM qube I run a test to make sure I have access via TOR. wget a random website, works fine. Load up firefox, confirm am on TOR network. All good.

Now … I start openVPN via command line using the exact same parameters that had it working perfectly when it’s connected to sys-firewall. OpenVPN initially actually connects to the VPN server, it TLS authorizes like normal, authorizes my account, connection established. And that - is the end of the fun. After that the connection just times out and is no longer establishable.

The openvpn client goes into a forever loop trying to re-establish connection, here’s the relevant section of log:

SIGUSR1[soft,connection-reset] received, process restarting
Restart pause, 5 second(s)
TCP/UDP: Preserving recently used remote address: [AF_INET]x.x.x.x:443
Socket Buffers: R=[131072->425984] S=[16384->425984]
Attempting to establish TCP connection with [AF_INET]x.x.x.x:443 [nonblock]
TCP: connect to [AF_INET]x.x.x.x:443 failed: No route to host
SIGUSR1[connection failed(soft),init_instance] received, process restarting
Restart pause, 5 second(s)
RESOLVE: Cannot resolve host address: xxxxx.mullvad.net:443 (Name or service not known)
RESOLVE: Cannot resolve host address: xxxxx.mullvad.net:443 (Temporary failure in name resolution)
Could not determine IPv4/IPv6 protocol
SIGUSR1[soft,init_instance] received, process restarting

(I have intentionally removed a few identifying information in log above, IP’s and server names replaced with x’s and timestamps removed)

It is saying: “no route to host” and unable to resolve domains …

Immediately upon disconnecting openVPN, all of those things work perfectly fine again in the same Qube via the whonix net, I’m able to wget pages, I’m able to resolve domains, etc, etc no problem and verify connection to Whonix/Tor still working.

I can then re-connect to openVPN server and same result, shortly after connecting, nothing works again, can’t pass through Internet to another appVM, can’t make requests inside the ProxyVM, etc.

I have now tested this with 3 major, quality VPN providers (mullvad being one as you can see from the logs) - same exact result with each provider. Ensuring I use TCP 443 connections on all of them.

Now, as said I’ve been running this setup successfully on Qubes 4.0 for a long time with no issues, I still able to run this setup on 4.0 without issue today - all my Whonix’s and other templates are up to date with latest patches on both systems.

Then I went so far as to create full backups off of the working 4.0 system of the whonix-gw, sys-whonix, and all 3 working VPN ProxyVM’s from each provider that all work in any string arrangement/order I place them in, all work running through Tor, etc, etc - made backups of all those, restored them all on Qubes 4.1 - and again, I end up with the exact same issues described above. The VPN connects for a moment and then seemingly no longer has a connection.

I have been trying this every single day with different configurations over the past 10 days, lots of reloading Tor, changing circuts, etc, etc, it doesn’t matter, the failures are consistent and they only occur on 4.1

I even tried changing my main router to see if somehow the router was the issue to a completely different one - no difference/no change.

I am exhausted and can’t think of anything else to try and am at a complete loss as to why this setup will not work for me, especially since it works perfectly fine even right now as I type this on my Qubes 4.0 workstaton.

I’ve even tried the 4.1 on multiple PC’s to make sure it wasn’t hardware related such as a network hardware problem. The results remain consistent for me.

If anyone has any kind of clue as to what might cause this, maybe something different with the way network routing works in 4.1 or some configuration differences with the default Whonix in 4.1 or what - I have no idea at his point.

Even if someone could simply test the same setup and see if they have similar issues?

Thanks for reading this very long post and any help is greatly appreciated.

Just wanted to post an additional follow up with some further details.

I have 3 separate PC’s running Qubes and I continue to have this issue trying to run a VPN Qube behind a sys-whonix one. I’ve tried resetting my gateways, reinstalling all the whonix tempates, etc.

Again this problem seems to only be occurring for me on 4.1 And I have since removed 4.1 off of 1 of my PC’s that this setup was not working on just to confirm it’s not any kind of hardware issue. I have 4.0 running on 2 computers with the exact above setup that works perfectly fine.

I have gone through the configs and the VPN Qube setups with a fine tooth comb to ensure they identical on the 4.0 & 4.1 systems. I’ve tried copying backups of the Qubes off the 4.1 system to the 4.0 system and they work through Whonix on the 4.0 system but when run on 4.1 they do not work.

So I have another log from a different provider than I posted in the previous post as follows:

 .
 .
00:00:33 2021 TUN/TAP device tun0 opened
00:00:33 2021 TUN/TAP TX queue length set to 100
00:00:33 2021 /sbin/ip link set dev tun0 up mtu 1500
00:00:33 2021 /sbin/ip addr add dev tun0 10.7.73.151/24 broadcast 10.7.73.255
00:00:33 2021 /sbin/ip -6 addr add fde6:7a:7d20:349::1095/64 dev tun0
00:00:39 2021 /sbin/ip route add ip.addr.of.vpn/32 via 10.137.0.21
00:00:39 2021 /sbin/ip route add 0.0.0.0/1 via 10.7.73.1
00:00:39 2021 /sbin/ip route add 128.0.0.0/1 via 10.7.73.1
00:00:39 2021 add_route_ipv6(::/3 -> ipv6:addr:of:vpn::1 metric -1) dev tun0
00:00:39 2021 /sbin/ip -6 route add ::/3 dev tun0
00:00:39 2021 add_route_ipv6(2000::/4 -> ipv6:addr:of:vpn::1 metric -1) dev tun0
00:00:39 2021 /sbin/ip -6 route add 2000::/4 dev tun0
00:00:39 2021 add_route_ipv6(3000::/4 -> ipv6:addr:of:vpn::1 metric -1) dev tun0
00:00:39 2021 /sbin/ip -6 route add 3000::/4 dev tun0
00:00:39 2021 add_route_ipv6(fc00::/7 -> ipv6:addr:of:vpn::1 metric -1) dev tun0
00:00:39 2021 /sbin/ip -6 route add fc00::/7 dev tun0
00:00:39 2021 Initialization Sequence Completed
00:01:43 2021 [server-name] Inactivity timeout (--ping-restart), restarting
00:01:43 2021 SIGUSR1[soft,ping-restart] received, process restarting
00:01:43 2021 Restart pause, 5 second(s)
00:01:48 2021 TCP/UDP: Preserving recently used remote address: [AF_INET]ip.addr.of.vpn:1194
00:01:48 2021 Socket Buffers: R=[131072->131072] S=[16384->16384]
00:01:48 2021 Attempting to establish TCP connection with [AF_INET]ip.addr.of.vpn:1194 [nonblock]
00:01:51 2021 TCP: connect to [AF_INET]ip.addr.of.vpn:1194 failed: No route to host
00:01:51 2021 SIGUSR1[connection failed(soft),init_instance] received, process restarting
00:01:51 2021 Restart pause, 5 second(s)

(real ip’s masked with “ip.addr.of.vpn”)

All of the routes seem to be generating correctly.

I have compared this log to the logs on the 4.0 system connecting to same VPN providers with same config files and server selections whereas the 4.0 system works and the 4.1 does not.

I have ensured that the routes are correctly pointing to the internal IP of the Whonix Qube.

As you can see, the initial connection is properly established:
00:00:39 2021 Initialization Sequence Completed

And then upon first keepalive ping attempt the connection is terminated (again, only on 4.1)

I’m not trying to troubleshoot VPN connections here, they all work fine when not going through Tor/whonix-gw on 4.1. And they also all work fine going through Tor/whonix-gw on 4.0 - the only don’t work when going though Tor on 4.1 - no other situation.

Anyone have any possible clue to what could be going on here or some suggestions of commands I could run to maybe determine how exactly the route seems to be getting broken once the VPN connects? Would it be something occurring inside the Whonix Qube and not the VPN Qube?

Hi @Anondoe I see nobody has replied yet. It may be because the post is quite long. Sometimes breaking down with markdown-formatted headers helps a lot with the reading.

I have read through all of this post, and I apoligise if I have made an error in what i have read in the log files you have provided. Whonix doesn’t support IPV6, but it seems you have routes configured for ipv6 - this will break whonix. I have previously run successfully (to my surprise) a working wireguard proxyVM between an appVM and sys-whonix. Although I scripted the way my wireguard proxy’s work (I am not a developer, there may well be issues with these, but they pass all leak tests i put them through when using appVM-> wireguard proxyVM->sys-firefall->sys-net and have the kill switch option enabled, and likewise with wg-proxy-vm and sys-firewall swapped in position)

To try and keep this reply as short AP, my openvpn proxies are not configured using CLI but with @tasket 's popular solution. Unfortunately, none of them are setup with config files for port 443 tcp, so I knew they wouldn’t work, tried it anyway and sure enough they didn’t.

If configuring your vpn to only route IPV4 works then, awesome, I hope it does. If not I will (hopefully) remember to revisit this, download the 443 tcp config files, setup an additional openvpn proxyVM, and appVM, disable internal IPV6 routing on both and see what results I get. IPV6 is intentionally disabled on my network adaptor. (I forgot to add, I am using 4.1, all testing/sec testing enabled on all vms,templates and dom0 with kernel-latest, purely FYI)

Firstly thanks for reaching out @deeplow - you’re right it’s a very long post. It seems too much time has elapsed now for me to be allowed to go back and edit the post so I’m not sure if I should create a new, better formatted one and we can delete this one after?

Second, I tried your suggestion of IPv4 only @blatchard - it makes no difference for me. This setup works perfectly fine (with IPv6 enabled) on Qubes 4.0.

^ Suggestion tried by removing:
setenv UV_IPV6 yes - and also set:
proto tcp4-client

Ends with same result.

For anyone else who’s able to help, let me just try to display a bit more visually what is working and what isn’t.

Assuming I have 2 VPN Qubes we will call VPN1 and VPN2.

Working

NET > FW > VPN1 > AppVM
NET > FW > VPN2 > AppVM
NET > FW > VPN1 > VPN2 > AppVM
NET > FW > VPN2 > VPN1 > AppVM
NET > FW > VPN1 > SYS-WHONIX > AppVM
NET > FW > VPN2 > SYS-WHONIX > AppVM

Not Working

NET > FW > SYS-WHONIX > VPN1 > AppVM
NET > FW > SYS-WHONIX > VPN2 > AppVM
NET > SYS-WHONIX > VPN1 > AppVM
NET > SYS-WHONIX > VPN2 > AppVM

All of the above with identical Qube configs and layouts 100% work on 4.0

Essentially any re-ordering or combination of placing Sys-Whonix BEFORE the VPN, breaks the VPN (which works perfectly fine on it’s own and combined with other VPN’s).

The Sys-Whonix & 4.1 are the only common denominators I’ve been able to find that cause this setup to not work.

Since my previous posts I have also attempted a fresh install of @fepitre’s 2021-08-14 weekly 4.1 alpha build with same results.

Ahhh… this explains a lot more the situation, I think. At least I got a grasp of it immediately.

Generally I’d say no. But since you can’t edit (you still need to spend a bit more time to do that) then I say yes. Start a reply here and then on the little :repeat: button on the top-left of the editor, select :heavy_plus_sign: reply as a linked topic .

Hi, sorry to bump but I’ve solved this here. I do not know why this doesn’t just work, and hopefully someone here can maybe give a better solution?

the same. fell into this with sys-whonix → vpn and R4.1

Hey guys! I’m really glad you brought this post back because I actually have been meaning to add more to it as I have some additional troubleshooting info.

After much frustration I actually did get a workaround setup to work. I’ve been so busy using it that I haven’t had chance to get back here with follow up :wink:

I’m actually going to immediately follow up this reply with a linked new thread based on my theory. See below (in a moment).

1 Like

What VPN are you using outside of mullvad
another one

Hello once more. I actually moved this thread which is why I stopped responding. I thought I had had created a link from the end of this thread to the new one, but it either didn’t work or I didn’t do it correctly.

New thread is called “Networking Broken in 4.1 Default Templates”. This title still works because now, over 5 months later, all the same issues still persist and this problem is not limited to VPN’s only but some other internal networking. I am about to make an updated post over there now that Qubes 4.1 is officially in “Final” release, hopefully we can get these issues resolved together.

Resume discussion here:

Hi. I can set up a sys-VPN, but i’m having issues making it work right now… Might be fedora or debian that is the issue… But are you telling me it’s connection issues in 4.1 then?
Could you link some good guide on how to set up a VPN in 4.1 the newest Qubes?
Thank you

Just put a firewall-vm between your vpn-proxy and sys-whonix and “vpn over tor” will work. I am using a fedora-34 template from a 4.1 install.

2 Likes

Thank you, qubesfan35267! This works! I feel like I had this problem for a year. This is a better fix than the arp command.
Can someone mention this to github? This kind of reminds me of the other fix we had for q4.1 apha about how dom0 coudn’t update over whonix unless fedora base vm is created. May be the simular update can work out for this as well?

Sorry, but I cannot mention it on github - do not have an account there.

I haven’t used Whonix in R4.1, but I’ve run into this after recently upgrading to R4.1 while connecting to a VPN I run.

The NetVM of the VPN qube in question isn’t “qubes-aware” and I believe for the NetVM qubes that are, they’re configured to respond positively to all ARP requests for any IP address on their vifs.

That’s likely why it “works” for standard Fedora or Debian NetVMs. Whonix may have locked down this ability and might explain what we’re both seeing.

Solution: modify the route to the VPN endpoint and add the onlink modifier.

Qubes’ networking is more of a point-to-point setup. Things like ‘default route’ and ‘netmask’ are a misnomer when it comes to intra-vm communication. The eth0 interface in the VPN qube will have a 32bit netmask. The default route will say use the NetVM’s IP off of eth0. The problem is when a route is added by OpenVPN to the VPN endpoint after reconfiguring default routes, the kernel thinks the route to the NetVM isn’t reachable based on the interface prefix, and dumps the arp request onto eth0 itself instead of routing. Adding onlink tells the kernel that yes, the NetVM IP is reachable on eth0.

For OpenVPN, an up script could be used to do:

#   trusted_ip=VPN server endpoint
#   route_net_gateway=Current default gateway
ip route replace "$trusted_ip"  via "$route_net_gateway" dev eth0 onlink

If ip route get <vpn_server_endpoint> after establishing the connection isn’t revealing the NetVM IP, the issue arises, even though ip route may indicate the route to the VPN endpoint should go through the NetVM IP.

I haven’t investigated why modification of routes wasn’t necessary under R4.0.

I ran into the same issue as others mentioned and just saw your post. Thanks for the insight. Could you please provide some details regarding the required firewall-vm configuration? I guess some “allow” rules must be set within the firewall-vm?

If my understanding is correct, the flow would look like:
Sys-whonix → Firewall VM → VPN Proxy VM → anon-whonix → Internet

Thanks!

Just set up a new firewall-qube using fedora-template and check “Provides network” in settings (advanced tab)…

The whole setup will look like this:
app-vm → VPN-proxy → Firewall-vm → sys-whonix → Firewall-vm → sys-net

Remember this is a vpn over tor scenario and there are only rare use cases for advanced users. The app-vm will inherit the ip of the vpn-proxy - not the exit node from sys-whonix.

Same problem since update on Sunday 20.3. it thought it was my fault because if playing around with some qubes. Try to connect, and my DNS also logs a connection request so the connection can not come together after this. Without Tor btw.