Thanks for the additional information.
So, in theory, what I’m looking for is a floating dns name that can be moved to the appropriate host on failover, however, this doesn’t fully solve the problem as litefs also has to decide that it is now the primary (there are a few old-school ways of doing this VeritasFS for Oracle comes to mind where it actually killed all the IO on the primary).
We have two possibilities right now for advertise_url, but neither will work with a static lease.
<alloc_id>.vm.<appname>.internal
<region>.<appname>.internal
I know it’s rather silly, but even if litefs queried a static url for the master name / advertise url to failover, that could work ok with some decent monitoring.
Anyway, it looks like for failover, we’re going to need a small consul cluster atm for v2.
Did some more testing. When litefs process is killed on the machine, it fails over immediately. When fly machines stop
is used, it hangs for a bit. Perhaps a tcp socket is lingering?