Process group-aware internal DNS: route between processes with ease!

containerops · June 20, 2023, 3:48pm

Great feature, by the way

So it seems that when using the canary release strategy, the canary-VM (a short-lived machine that will stop as soon as the health checks pass) will also be discovered and included in the DNS response for the web process:

dig +short AAAA web.process.<region>.<app-name>.internal
[machine-xxxxx-ipv6]
[machine-xxxxx-ipv6]
[canary-vm-ipv6]

Considering there is a small delay in propagation (which is okay, not the issue here), the listing of the canary-VM causes some issues because when another app tries to use it, it is guaranteed to not exist anymore, as these machines are short-lived by definition.

I was not expecting to see the canary-VM being listed in the web process DNS response, as these machines are temporary and part of the Fly.io release process, rather than being actual instances of the app. These machines would already be in the process of stopping by the time they are shown in the DNS query, so it’s essentially an invalid address.

Of course, the current behaviour would make sense if, instead of stopping the newly created canary-VM, the old machines running the previous version were stopped/swapped. However, this is not the case - the canary-VM is only used to ensure that the health checks pass before proceeding with the rolling release, and then stopped.

So maybe they shouldn’t be included in the group-aware internal DNS responses.

Not related, but there is a bug report here:

web.process.<app>.internal only returns one machine:

dig +short AAAA <app>.internal
[ip1]
[ip2]

dig +short AAAA <group-name>.process.<app>.internal
[ip2]

<group-name>.process.<region>.<app-name>.internal including the region is a workaround:

dig +short AAAA <app>.internal
[ip1]
[ip2]

dig +short AAAA <group-name>.process.<region>.<app-name>.internal
[ip1]
[ip2]