Github Issue #9011 - DNS leakage when only one DNS server is set in a NetVM

Github Issue #9011 - DNS leakage when only one DNS server is set in a NetVM

Filed on 4th May:

One of my machines has a NetVM with a physical network interface, manually set to have only one DNS server (local). In this scenario, the Qubes dnat-dns chain contains (a table of two nftable rules, one for udp and one for tcp, one IP). … sometimes (when the local DNS server fails to respond on time) the qubes issue a DNS request to 10.139.1.2, which is in fact routed out of the VM because nothing intercepts it

Ok. Let’s emulate the scenario. User has a router at home with IP address 192.168.1.1 (default gateway & DHCP server). DHCP pool of 192.168.1.100-200. One Pi-Hole at 192.168.1.2 which is setup as the only DNS server in DHCP config. User router is behind a draconian ISP which logs DNS queries to all destinations. We could emulate the scenario by manually setting only one DNS in sys-net.

Diagnosis

Looking at qubes-setup-dnat-to-ns of qubes-core-agent-linux. In the install_firewall_rules function, the culprit is zip(qubesdb_dns, get_dns_resolved()) in the for loop. And it only happens if get_dns_resolved function could not get dns list from dbus and reverts to
get_dns_resolv_conf (because dbus solution duplicates the DNS servers and solves the issue).

Then we will have smaller tuple size for get_dns_resolved (only one i.e. 192.168.1.2] and bigger tuple size for qubesdb_dns (i.e 10.139.1.1 & 10.139.1.2). Which results on no rules for 10.139.1.2 to be added to dnat-dns chain.

Subsequently DNS queries to 10.139.1.2 are routed via router to draconian ISP. User is right in this case.

Solution 1

User could qvm-perfs sensitiveappvm dns 10.139.1.1 and remove one of Qubes DNSes from sensitive AppVMs. But this is far from perfect. Maybe user machine is a portable laptop which sometimes connected to perfectly safe ISP where user needs two DNS. So this solution is cumbersome.

Solution 2. Proper fix

Duplicate the servers in get_dns_resolv_conf to be on the safe side.
Instead of

    return nameservers

Doing this:

    return nameservers + nameservers

Value to user: user will not have to use qvm-perfs to remove one DNS from VMs.
caveat: We have to explain this to a core team member and take their time. Before doing that, I appreciate if others who have read this post till this point could suggest if solution 2 makes sense and is worth it or not.

3 Likes

Would you not get the same result by adding the DNS server as NS1 and NS2 in the NetVM?

Yes. Also duplicating the same DNS at the router. So this is also a solution. But both are manual

1 Like

Yes, but it leaves the option open to combine the local DNS, with the Qubes OS DNS as backup.

Could this simply be fixed by adding to the documentation how to configure the DNS settings, if you want to use a custom local DNS?

I guess it depends on if this is a security issue or not.

Aren’t the queries to Qubes OS DNS routed to local DNS via the dnat-dns chain rules?

Yes

Yes. It is a security and privacy issue. If the queries to Pi-Hole resolver fail, they will be forwarded to draconian ISP and they will know (unless user uses sys-whonix or inside net-vm DoH, DoT, DNSCrypt or similar).

2 Likes

It is also necessary to understand that on which templates the dbus is used and on which ones it reverts to get_dns_resolv_conf. I have to see the commit history or investigate it.

2 Likes

I’d say that qubesdb_dns is making the assumption that there’s a specific number of DNS servers*, so should be responsible for checking how many are provided by whatever it’s calling out to. Where fewer (1) servers are provided, repeating them to fill its fixed tuple is preferable to silently using a default. So I suggest a variant of your solution 2, where it’s the qubesdb_dns code that changes.

Examining nftables or trace packets to find out what DNS servers are actually being used is low level, so a secondary fix here could be making this info more accessible to users. “Validating what DNS servers I’m using” is quite an important use case for some of the non-technical Qubes user profiles.

* this could also indicate a lack of care or review, so it may be worth looking through the code+PR history.

Ok. Another user confirmed the issue as well on Github. And suggested a solution that I find a little bit complex and involving for the moment. We could have a rapid patch for the time.

Another way to do this would be patching it similar to this to assure it works well even if dbus gets fixed in the future:

@@ -98,7 +98,10 @@ def install_firewall_rules(dns):
         'chain dnat-dns {',
         'type nat hook prerouting priority dstnat; policy accept;',
     ]
-    for vm_nameserver, dest in zip(qubesdb_dns, get_dns_resolved()):
+    dns_resolved = get_dns_resolved()
+    while len(qubesdb_dns) > len(dns_resolved):
+        dns_resolved = dns_resolved + dns_resolved
+    for vm_nameserver, dest in zip(qubesdb_dns, dns_resolved):
         dns_ = str(dest)
         res += [
             f"ip daddr {vm_nameserver} udp dport 53 dnat to {dns_}",

This way, user will be able to have 2, 3, 4, … x DNS servers for AppVMs and they will be always properly routed to available DNS servers within sys-net sequentially.

The whole qubes-setup-dnat-to-ns code was written from scratch during the transition from iptables to nftables. It was a part of a relatively big PR.

2 Likes

There’s also a strict argument to zip that might be appropriate (would cause a hard failure - need to check if that’s appropriate here).

Instead of dns_resolved + dns_resolved, you might be looking for chain cycle:
for vm_nameserver, dest in zip(qubesdb_dns, cycle(dns_resolved)) # untested

Looks like another good reason for code auditing - this is the sort of defect we could expect a non-specialist to catch.

I agree this shouldn’t wait for the bigger solution, and also with the suggestion to move other parts of the code away from the magic number 2.

I can’t reproduce this. I have a single DNS host in /etc/resolv.conf in sys-net and I see:

chain dnat-dns {
	type nat hook prerouting priority dstnat; policy accept;
	ip daddr 10.139.1.1 udp dport 53 dnat to 192.168.0.1
	ip daddr 10.139.1.1 tcp dport 53 dnat to 192.168.0.1
	ip daddr 10.139.1.2 udp dport 53 dnat to 192.168.0.1
	ip daddr 10.139.1.2 tcp dport 53 dnat to 192.168.0.1
}

See also:

https://groups.google.com/d/msgid/qubes-devel/20240613192637.2bea0675%40localhost

What do you use for sys-net template? The get_dns_resolved function tries to get DNS servers from dbus 1st. Which duplicates the servers and fixes the issue (I do not know why dbus does it). It only reverts to get_dns_resolv_conf function if dbus solution fails. I mentioned it in the previous posts.

Anyone knows what is _g in the script? Line 74.

The DBus systemd-resolved response have two sets of replies for each IP. _g is 0 for one and 3 for other. Is it some kind of TCP/UDP signifying?

p.s.:

I found it in freedesktop specifications. _g is the ifindex (Interface Index). Zero means the computed resolvers (what you finally get in your /etc/resolve.conf) and each interface could have their own DNS (e.g. if setup statically or via DHCP). That is why we get duplicates via dbus.

The only remaining sruggle before popping up at Github is to find out the occasion when the dbus fails and program reverts to get_dns_resolv_conf function.

p.s. 2:

Maybe it is SELinux which prohibits dbus connection for the two users on Github. I am using a minimal template which I believe does not come with selinux installed

BTW, What is this resolve1_proxy doing? It is never used. I commented it out and it still works.

Found it. dbus solution works well on Fedora (40) based template. But fails on debian (tested debian 13) based template. I guess you could reproduce it if you switch your sys-net to a Debian based template.

1 Like

Special case for the above is necessary. In case user has connectivity but for some reason no ordinary DNS is set (maybe user gets ony IPv6 DNS from router or wants to enforce DoH, DoT, …). The above solution will go to endless loop. In such a case, special nftables rule is necessary to drop all DNSs requests to (‘/qubes-netvm-primary-dns’, ‘/qubes-netvm-secondary-dns’)

What do you use for sys-net template?

Fedora.

I think solution #2 would result in duplicate rules, i.e. extra
instructions in the netfilter kernel VM.

Perhaps we can simply add a final drop, so there are no leaks:

chain dnat-dns {
	type nat hook prerouting priority dstnat
	policy accept

	dnat to ip daddr . meta l4proto . th dport map {
		# ... see my suggestion in qubes-devel
	}

	meta l4proto . th dport vmap {
		tcp . 53 : drop,
		udp . 53 : drop
	}
}

BTW

perfectly safe ISP

contradicts the philosophy of distrusting the infrastructure. It also
seems to me a security theater to confide in sys-net to take care of
securing any unencrypted traffic (like DNS) too. In that sense, I am
questioning the validity of the issue as a whole.

Unfortunately, my Python skills are extremely limited, so my attempt:

I am the opposite. My nftable skills are extremely limited.

That would be the sys-firewall territory. User might legitimately want to use host or dig commands in an AppVM to troubleshoot or query some specific DNS server. We do not want to block everything. Even blocking queries to 10.0.0.0/8, 192.168.0.0/16 or 172.16.0.0/12 is too much. The only possiblity is to block dns requests to 10.139.1.0/24 if not in the previous rules.

I meant a perfectly fine environment. Such as a home/office setup where user has enterprise grade DNS setup with proper rewriting of all DNS unencrypted requests to a trusted non-logging upstream via a secure protocol (e.g. DoH, DoT) at gateway.

That would be the sys-firewall territory. User might legitimately want to use host or dig commands in an AppVM to troubleshoot or query some specific DNS server. We do not want to block everything. Even blocking queries to 10.0.0.0/8, 192.168.0.0/16 or 172.16.0.0/12 is too much. The only possiblity is to block dns requests to 10.139.1.0/24 if not in the previous rules.

How do you distinguish a leak from a legitimate leak? Without restriction, there is no possibility to prevent any. Yes, sys-firewall is the proper territory. sys-net is unreliable and one can use it to “leak” deliberately if one wants.

I meant a perfectly fine environment. Such as a home/office setup where user has enterprise grade DNS setup with proper rewriting of all DNS unencrypted requests to a trusted non-logging upstream via a secure protocol (e.g. DoH, DoT) at gateway.

How do you know the LAN DNS does not leak? There are very few fully FOSS routers, so it is a distrusted territory too.

IMO, the proper overall solution is sys-dns running DNSCrypt. I wrote a guide about that some time ago. I will update it when I have time.

That would be awesome. However for the time, I have the simple short-term goal of fixing the existing bug without introducing any additional bugs.

We can work on DoH, DoT or DNSCrypt implementation separately. I have my own ideas of writing a full featured DNS for Qubes to optionally & securely resolve VMNAME.qubes-os.home to visible_ip property of VM within the netvm. This would allow easier management and orchestration of VMs connected to the same netvm. Also using an encrypted non-logging DNS for upstream queries which could be DoH, DoT or DNSCrypt.

I communicated with DemiMarie on Github and she has confirmed that the resolv1_proxy variable cold be safely deleted from the source without an issue. The only remaining decision is what to do with the DNS requests to qubes-netvm-primary-dns & qubes-netvm-secondary-dns if no upstream IPv4 DNS is defined/available.

The logical approach would be just dropping them. Or dnat them to 127.0.0.1 (in case we have DNSCrypt or something similar on it). Since you are more proficient with nftables @qubist, what would the final table look like in either case? I have been also looking at your communication on mailing list and trying to understand the difference between the 3 suggested rule-sets. I need in-depth study of nftables. I appreciate if you could introduce a proper up to date resource.

1 Like

fixing the existing bug without introducing any additional bugs.

Sure.

We can work on DoH, DoT or DNSCrypt implementation separately.

Yes. I prefer DNSCrypt because of its “anonymized” mode, which makes non-logging, which you mention, a non-issue. No need to trust.

The only remaining decision is what to do with the DNS requests to qubes-netvm-primary-dns & qubes-netvm-secondary-dns if no upstream IPv4 DNS is defined/available.

I think it might be undefined but available implicitly through DHCP. I have not researched this though.

The logical approach would be just dropping them. Or dnat them to 127.0.0.1 (in case we have DNSCrypt or something similar on it). Since you are more proficient with nftables @qubist, what would the final table look like in either case?

Disclaimer: I am not a networking expert.

For dropping, see my reply from 17 Jun 2024 18:49:40 -0000. For the other option (dnat to localhost), replace ‘drop’ with ‘redirect’ (see) in the vmap.

However, if the goal is to discard these packets, it seems more correct to drop them. If DNSCrypt proxy is running, that would be in a separate qube, so the redirect should happen there, not in sys-firewall (or sys-net). sys-firewall would then forward DNS requests to sys-dns through a custom rule.

I have been also looking at your communication on mailing list and trying to understand the difference between the 3 suggested rule-sets. I need in-depth study of nftables. I appreciate if you could introduce a proper up to date resource.

https://wiki.nftables.org

Unfortunately, some things are not explained very well.

2 Likes