Analyze dropped packets

I am analysing why the wireguard connection does not work. It has something to do with MTU I guess.

ping 1.1.1.1 works but ping -s 1500 1.1.1.1 does not work. I was able to narrow it down to a dropped packet in sys-net vm.

my setup is sys-vpnsys-firewall (vif38)sys-net (ens6) → router (also has a vpn) → vpn server connected from sys-vpn ip: 89.46.223.58. The tcpdump output below is from sys-net.

The issue is sys-net does not pass the packet with ID 57174 received from vpn server to sys-firewall.

tcpdump from sys-net
11:02:34.299577 vif38.0 In  IP (tos 0x0, ttl 63, id 10078, offset 0, flags [none], proto UDP (17), length 1392)
    <sys-firewall ip>.33597 > 89.46.223.58.51820: UDP, length 1364
11:02:34.299622 ens6  Out IP (tos 0x0, ttl 62, id 10078, offset 0, flags [none], proto UDP (17), length 1392)
    <sys-net ip>.33597 > 89.46.223.58.51820: UDP, length 1364
11:02:34.299627 vif38.0 In  IP (tos 0x0, ttl 63, id 10079, offset 0, flags [none], proto UDP (17), length 284)
    <sys-firewall ip>.33597 > 89.46.223.58.51820: UDP, length 256
11:02:34.300004 ens6  Out IP (tos 0x0, ttl 62, id 10079, offset 0, flags [none], proto UDP (17), length 284)
    <sys-net ip>.33597 > 89.46.223.58.51820: UDP, length 256
11:02:34.402363 ens6  In  IP (tos 0x0, ttl 50, id 57174, offset 0, flags [+], proto UDP (17), length 1364)
    89.46.223.58.51820 > <sys-net ip>.33597: UDP, length 1408
11:02:34.402415 ens6  In  IP (tos 0x0, ttl 50, id 57174, offset 1344, flags [none], proto UDP (17), length 92)
    89.46.223.58 > <sys-net ip>: ip-proto-17
11:02:34.402478 ens6  In  IP (tos 0x0, ttl 50, id 57175, offset 0, flags [none], proto UDP (17), length 252)
    89.46.223.58.51820 > <sys-net ip>.33597: UDP, length 224
11:02:34.402501 vif38.0 Out IP (tos 0x0, ttl 49, id 57175, offset 0, flags [none], proto UDP (17), length 252)
    89.46.223.58.51820 > <sys-firewall ip>.33597: UDP, length 224

nft input chain rules
        chain input {
                type filter hook input priority filter; policy drop;
                jump custom-input
                ct state invalid counter packets 11 bytes 15796 drop
                iifgroup 2 udp dport 68 counter packets 0 bytes 0 drop
                ct state established,related accept
                iifgroup 2 meta l4proto icmp accept
                iif "lo" accept
                iifgroup 2 counter packets 0 bytes 0 reject with icmp host-prohibited
                counter packets 10 bytes 840
        }

Any idea what’s the problem here?

1 Like

Since you are going through 2 VPN tunnels, you should try lowering the MTU in sys-vpn to accommodate the new encapsulation happening on the router.
If the router is also using WG, try lowering your current sys-vpn MTU value by 60 based on the value set on your router.

2 Likes

Changing MTU does not fix the issue. I went as low as setting sys-vpn MTU to 1280 and the wireguard interface in sys-vpn to 1220. The issue seems to be the packet rejected in sys-net because of nft rule ct state invalid drop

On further analysis, it looks like when the UPD packet is fragmented (like ID 57174 in my OG post) the conntrack is rejecting it because of bad checksum. I was able to enable conntrack logging and see something like sys-net kernel: nf_ct_proto_17: bad checksum IN=ens6....

Now I am trying to figure out why the UPD packet has bad checksum.

1 Like

I am convinced that I should disable UDP checksum validation at least for fragmented packets. Looks like the checksum is calculated using source and destination address and is recalculated every hop. For e.g. sys-net is recalculating the UDP checksum it received from ens6 before passing it on to sys-firewall.
In my case, the packet got fragmented before wireguard encryption of the router. So the checksum was calculated using the source and destination address before fragmentation and was not updated until it got assembled again in sys-net.

Now, I am trying to figure out if I have to disable checksum validation for all UDP packets or can I do it for more specific subset like wireguard packets or fragmented UDP packets or both.

Edit: My assumption that the fragmentation of UDP packet happened before the router wireguard vpn might be wrong. It is quite possible that the fragmentation happened in the router. In latter case, the checksum should be correct. But I’m still inclined towards disabling the UPD checksum verification.

Edit 2: disabling checksum validation did not fix the issue. The packet is not dropped in ct state invalid nft rule. But the packet is not sent to downstream sys-firewall. I am still looking for some answers on why the packet is not sent to sys-firewall.

1 Like