Friday, 3 October 2014

How to workaround missing multihoming support for UDP server applications using conntrack

It pretty common to design UDP server applications to have a single socket bound on 0.0.0.0:<someport>, that receive datagrams and then replies with sendmsg/sendto. Unfortunately, this design cannot work with servers that have multiple interface, as the outgoing datagram will just pick up the default gateway. But there is a way to exploit conntrack, policy based routing and veth interfaces it to have your packets routed correctly. This solution was suggested by jkroon on Freenode (but I've modified some details).

The idea is to have conntrack remember which interface the first packet came in from, and force an automatic source address translation on the reply packets. Do this:
  • enable ip forwarding, disable return patch filtering by default:
  •  echo 1 > /proc/sys/net/ipv4/ip_forward  
     echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter  
    
  • create a veth pair, assign to it a private addresses, disable rp_filter on one end:
  •  ip link add serverintf type veth peer name distributeintf  
     ip ad ad 10.10.10.1/30 dev serverintf  
     ip ad ad 10.10.10.2/30 dev distributeintf  
     ip link set serverintf up
     ip link set distributeintf up
     echo 0 > /proc/sys/net/ipv4/conf/serverintf/rp_filter  
     echo 0 > /proc/sys/net/ipv4/conf/distributeintf/rp_filter  
    
  • force your server application to bind  to 10.10.10.1 (if that is not supported there are LD_PRELOAD tricks, or even network namespaces), and make packets coming out from that stick to serverintf even if they have a global destination:
  •  ip rule add from 10.10.10.1 lookup 1010101  
     ip route add default dev serverintf via 10.10.10.2 table 1010101  
    
  • use DNAT to route connections to the server:
  •  iptables -t nat -A PREROUTING -p udp -m multiport --dports $serverports -j DNAT --to-destination 10.10.10.1
     iptables -t raw -A PREROUTING -i distributeintf -j CT --notrack
     #for each interface eth$i with ip $ip and gateway $gateway
     echo 2 > /proc/sys/net/ipv4/conf/eth$i/rp_filter
     ip rule add from $ip lookup $((i + 1))
     ip route add default dev eth$i via $gateway table $((i + 1))  
    
The core of the trick here is to call conntrack in action with the DNAT target, so it reminds the connection tuple for us, then have the server send packets from 10.10.10.1 and not 0.0.0.0, so the default gateway/interface is not picked. The roundtrip on the veth interfaces is necessary because in the second route lookup in the output path cannot change the interface chosen in the first lookup right after the application emits the packet.

For clarification, this is the path that packets travel.
  • incoming: packet comes from client to interface eth$i with destination ip$i, DNAT translates ip$i to 10.10.10.1 and the packet is received. A NAT entry is created on the first packet, reminding that packets going from 10.10.10.1 to client need to have the source translated to ip$i
  • outgoing: server sends response packet from 10.10.10.1 on serverintf to client, conntrack changes the source address to ip$1 after POSTROUTING. The packet is looped to distributeintf, and forwarded to the correct eth$i thanks to policy routing. From now on, it's a safe journey.

Remarks

When it comes to advanced routing there are some default sysctl settings that can stand in the way. I suggest you debug what's going on if something doesn't work by enabling martians logging (echo 1 > conf/*/log_martians).
If you don't have ip addresses assigned to your interface (in the case those are tunnels to which untouched packets are forwarded), you may need to adjust the logic that bumps a packet from distributeintf to the right interface. YMMV.

If you understand what's going on here you probably wonder why I use veth interfaces instead of lo: the reason is that a packet traveling in the PREROUTING chain on lo cannot be forwarded on other interfaces. It sticks to lo, and it is received locally no matter what.

No comments:

Post a Comment