dig PRIMARY_REGION.FLY_APP_NAME.internal AAA order

When deploying, often times, the previous instance hasn’t yet been removed, after the new instance has booted, resulting in multiple instances with the same internal domain. As a result, digging the internal domain of a region may return an array of ip addreses - the old and new.

In this scenario, what is the expected order or the dig results? Which comes first, the old or new? Can one rely on the result order being consistent.

One can mitigate this issue, by catching a failed connection, and re-querying / re-initializing, but this leads to a greater “outage” until the connection can be re-established.

Is there a method of determining querying the IP of the most recently created instance?

Not sure about the dig ordering (I would assume you could not rely on the order alone) but I wonder if this issue could be avoided using a different deploy strategy :thinking: … such as immediate? The default strategy is canary, which i believe creates the new vm before removing the prior one, resulting in both briefly being present, as you’ve found. Worth a try while Fly responds about the dig.

The older IPs come first. You can rely on the order for now, but we’ve introduced a tiny little query DSL that we can extend and support over time.

If you query nearest.of.<app>.internal it sorts them the way you’d think. I think we could implement an oldest.of.<app>.internal as well.

Thanks for the suggestion, @greg. I didn’t even think about adjusting the deploy strategy. An immediate deploy strategy would resolve the duplicate vm issue (although the “security/stability” provided by the canary/rolling strategy is nice). I also don’t think it would solve the core issue I am trying to solve, which is:

I really need the Primary to deploy first, before the replicas.

I am trying to establish replication amongst nodes. I designate a single instance, in a single region, as the Primary server, of which the replicas connect. Requests to a replica, are channeled to the Primary, before being broadcasted to all replicas. But this model relies on the replicas being able to get the address of the new primary.

In a perfect world, that Primary region instance would always build first, before the rest of the regions rollover.

Otherwise, if the primary does not boot before the replicas, the replicas will fail to deploy because the primary address will be invalid. Either while booting, or worse after initialization, while the app is running. At least during boot, the deploy will fail, and the nodes will repeat deployment, if the fly configuration allows, correcting the situation in time. But that is not ideal.

This issue I believe is demonstrated with the redis-geo-cache example.

Is there a way to specify which Region deploys first?

I guess another solution would be to deploy two different apps - one for the primary and one for the replicas. BUT, if they are two different apps are they in the same IP namespace? Can one app dig the IPV6 address for another app?

Doh, I just thought of a WAY simpler solution - add a sleep command to the sh file that configures and runs the app, if the app is a replica. It ain’t pretty, but delaying replicas by a few seconds should allow the primary to finish initialization before the replicas no matter what order they boot in. Additionally, the Primary would have finished deploying resulting in the previous VM being destroyed and only one remaining ip address for the primary.

That’s probably the solution right there. I’ll give that a go. Thanks everyone. Sorry for the long winded train of thought. Leaving this comment to close the loop.

1 Like