I have been troubleshooting this problem for a solid week, many many hours, tried many things and can’t seem to determine what is wrong here. Am hoping maybe someone else can at least run a test on their own Qubes 4.1 with a similar setup to see if they have the same issue.
For at least a year now I have been running a sys-net → sys-firewall → sys-whonix → sys-vpn → AppVM setup entirely successfully without issue on Qubes 4.0.3 & 4.0.4.
My ProxyVM/VPN Qubes are setup using the community vpn guide CLI method with the anti-leak script. As stated, they have always worked. I have even done a net>firewall>vpn1>tor>vpn2>appvm setup on numerous occasions which also works perfectly fine.
But as is right now, I’ve been running Qubes 4.1 for over a month and not once have I been able to get either of the above setups to work correctly (or … at all).
I have installed multiple different ISO’s of Qubes, numerous of the weekly Alpha versions kindly provided by @fepitre. I have attempted this setup on his 06-12 release, 07-03, 07-31, and finally - currently I have been trying it on official “Beta1” all day today.
Here are the steps I have taken to reproduce this problem multiple times:
After fresh/initial install of Qubes, create new AppVM, download current config files for OpenVPN (TCP 443 versions for using through TOR) - test the connection to VPN over clearnet (sys-net > sys-firewall > sys-vpn) - this is without having added any of the fancy scripts from the community VPN instructions.
VPN works perfectly fine in this scenario. Then set it up as a “provides network” ProxyVM, create another new AppVM, make the previous proxyVM it’s network, test connection from there - works perfectly fine. So now I know I definitely have a working VPN Qube where VPN tests inside the qube itself as well as using it as a proxyVM on another Qube both work.
Shut down the qubes. Set their networking to sys-whonix now instead of sys-firewall. Start up sys-whonix, verify whonix connection with Nym. Verify whonix connection again with DispVM based on fedora-33-dvm (also tested with fedora-34 & fedora-32 dvm’s). Whonix gateway & sys-whonix both working perfectly fine.
I then start up the ProxyVM which is now using sys-whonix, from in the ProxyVM qube I run a test to make sure I have access via TOR.
wget a random website, works fine. Load up firefox, confirm am on TOR network. All good.
Now … I start openVPN via command line using the exact same parameters that had it working perfectly when it’s connected to sys-firewall. OpenVPN initially actually connects to the VPN server, it TLS authorizes like normal, authorizes my account, connection established. And that - is the end of the fun. After that the connection just times out and is no longer establishable.
The openvpn client goes into a forever loop trying to re-establish connection, here’s the relevant section of log:
SIGUSR1[soft,connection-reset] received, process restarting Restart pause, 5 second(s) TCP/UDP: Preserving recently used remote address: [AF_INET]x.x.x.x:443 Socket Buffers: R=[131072->425984] S=[16384->425984] Attempting to establish TCP connection with [AF_INET]x.x.x.x:443 [nonblock] TCP: connect to [AF_INET]x.x.x.x:443 failed: No route to host SIGUSR1[connection failed(soft),init_instance] received, process restarting Restart pause, 5 second(s) RESOLVE: Cannot resolve host address: xxxxx.mullvad.net:443 (Name or service not known) RESOLVE: Cannot resolve host address: xxxxx.mullvad.net:443 (Temporary failure in name resolution) Could not determine IPv4/IPv6 protocol SIGUSR1[soft,init_instance] received, process restarting
(I have intentionally removed a few identifying information in log above, IP’s and server names replaced with x’s and timestamps removed)
It is saying: “no route to host” and unable to resolve domains …
Immediately upon disconnecting openVPN, all of those things work perfectly fine again in the same Qube via the whonix net, I’m able to
wget pages, I’m able to resolve domains, etc, etc no problem and verify connection to Whonix/Tor still working.
I can then re-connect to openVPN server and same result, shortly after connecting, nothing works again, can’t pass through Internet to another appVM, can’t make requests inside the ProxyVM, etc.
I have now tested this with 3 major, quality VPN providers (mullvad being one as you can see from the logs) - same exact result with each provider. Ensuring I use TCP 443 connections on all of them.
Now, as said I’ve been running this setup successfully on Qubes 4.0 for a long time with no issues, I still able to run this setup on 4.0 without issue today - all my Whonix’s and other templates are up to date with latest patches on both systems.
Then I went so far as to create full backups off of the working 4.0 system of the whonix-gw, sys-whonix, and all 3 working VPN ProxyVM’s from each provider that all work in any string arrangement/order I place them in, all work running through Tor, etc, etc - made backups of all those, restored them all on Qubes 4.1 - and again, I end up with the exact same issues described above. The VPN connects for a moment and then seemingly no longer has a connection.
I have been trying this every single day with different configurations over the past 10 days, lots of reloading Tor, changing circuts, etc, etc, it doesn’t matter, the failures are consistent and they only occur on 4.1
I even tried changing my main router to see if somehow the router was the issue to a completely different one - no difference/no change.
I am exhausted and can’t think of anything else to try and am at a complete loss as to why this setup will not work for me, especially since it works perfectly fine even right now as I type this on my Qubes 4.0 workstaton.
I’ve even tried the 4.1 on multiple PC’s to make sure it wasn’t hardware related such as a network hardware problem. The results remain consistent for me.
If anyone has any kind of clue as to what might cause this, maybe something different with the way network routing works in 4.1 or some configuration differences with the default Whonix in 4.1 or what - I have no idea at his point.
Even if someone could simply test the same setup and see if they have similar issues?
Thanks for reading this very long post and any help is greatly appreciated.