Per-port UDP service routing is now available

Most apps on Fly.io serve traffic using a connection-oriented protocol like HTTP or raw TCP. We do support UDP services, but they have not been as polished a feature, at least not with feature parity as HTTP or TCP services. For example, there’s no real load-balancing for UDP services: at any given time only one instance will be picked to serve all UDP traffic for a given app[1].

One peculiar limitation that I wasn’t even aware for a long time after joining Fly.io is that UDP routing is based on IPs, not (IP, port) tuples. This means that if you have one app with multiple machines exposing a different set of UDP ports – for example, using process groups – they will not be handled properly and all ports will in fact be routed to just one of your machines. Process groups are a powerful feature and recently we’ve seen multiple users attempting to run different UDP services in different process groups, ending up confused by why we seem not to route the packets correctly. It is not their fault! Nowhere in our documentation did we indicate that this is unsupported.

The only reason this wasn’t supported like for TCP is that UDP on Fly.io is served by a different piece of program: for TCP, everything is handled by the Fly Proxy, which is a big Rust program; UDP, however, is served by a simple piece of eBPF program running in the kernel network stack. To limit the complexity of that program, many useful features are left out, and not many users required UDP services in the past anyway.

As we grew, that’s no longer true. We’ve seen legitimate use cases where having multiple UDP ports on different machines would be really useful; the lack of which has been a blocker for some. That piece of eBPF program has also long evolved out of its simple past self: we’ve added support for “multihop” forwarding to make it more resilient to global state desynchronization events; the “simple” eBPF program is now also responsible for keeping 6PN addresses stable when your machine is migrated. There’s really no reason to leave out ability to route different UDP ports to different machines anymore. We have been feeding it information to make routing decisions based on ports all this time, it’s just that it has not been making use of it.

So, we went ahead and implemented this feature. In order to ensure a smooth transition, we decided to enable per-port routing if and only if your app does define different ports for different machines. Otherwise, all behavior remains the same as before – that means all UDP traffic will be routed to one of your machines regardless of whether it defines that port. In the process, we also had to upgrade flow-tracking in the eBPF program to use the full 4-tuple instead of only (source ip, destination ip) as we did before. This limitation (using only 2-tuples for flow tracking) was probably also the cause of some edge-case bugs before, which should hopefully be fixed as well.

Other UDP service limitations still remain. For example, you still can’t define a UDP service whose internal and external ports do not match (or rather, you can, but we ignore the internal port specification and forward external ports as-is). We may improve UDP services further in the future to remove these limitations – stay tuned!


  1. Technically, this is not fully correct; each “edge” server picks their own instance from their own point of view. But usually this converges to one specific instance per region, at the very least. ↩︎

10 Likes