Firstly, great work on Fly! I’ve seen the docs and the community grow steadily over the past few months and it’s very exciting to me as a user!
I am starting to work on an app which requires that multiple VM instances join a cluster in a particular region. This will mostly be managed by raft cluster and persists in-memory. For this to work, I have to:
Ensure the appropriate ports are open for the VM instances to talk to one another
Have a way of identifying the cluster leader. I don’t imagine FLY_ALLOC_ID will be particularly useful here?
Is this feasible on Fly given the current architecture? Any thoughts appreciated.
You can open ports up. There is a FLY_PUBLIC_IP environment variable with the VMs public ipv6 address. You can open public ports up with an experimental config setting:
[experimental]
allowed_public_ports = [8080]
Determining cluster leader is harder. I can imagine two ways of doing this:
Most Raft apps I’ve seen can be configured with a leader seed. You could run an app with min/max counts of 1 that’s the leader at boot time. Then run a second app that uses the first app’s public load balanced IP to “see” the leader.
You can probably do something with DNS. We’ve experimented with apps that write their IP to a DNSimple record, if you have a way to make outside API calls when a leader is promoted you could maintain a leader.example.com AAAA record with the current leader’s public IP address.
This kind of use case is something we want to make work well, and we have some ideas on how to do it easier, so I’m curious what you can figure out.
@kurt This is cool… it should be possible to run a NATS server cluster on Fly, then. That’s also a pretty useful messaging system setup. Need to check if they can support peers over IPv6.
If we want to do this without a leader seed, there does need to be a reflective API of some sort, though. If an app were to hit its own Fly domain name, there’s needs to be a guarantee that the request won’t come back to the same instance — if it did that instance would believe it was alone in the world even it’s not.
Might also work if there was a special Fly URL or IPv6 redirect address that would always point at the oldest living instance. This would provide an automatic bastion that all joiners could work with.
That IPv6 to the oldest instance is a really interesting idea. We’ve thought a lot about adding IPs with special routing behavior, that could be super useful.
We’re thinking through service discovery, it seems like most of these types of tools want to be seeded with as many previous members as possible, right? Or at least deliberately seeded? Otherwise if you start like 10 instances all at once, you could end up with split brain?
I can’t speak for all tools, but the ones I’ve seen can be seeded with one member (more accurately, memberlist libraries are available that work this way). The leader election is a secondary and dynamic process that happens inside the member list, but the full member list is available with or via any active member.
That’s why I’m suggesting discovering oldest living member - this will return the correct self member in the starting edge case, and will consistently return a living member in all subsequent cases. Split brain can happen if two members start at the same time and are reported separately to the third and fourth, of course, but if there’s some amount of locking in the scaler I’d expect this to be rare.
Best would be the metadata API that returned a list of other IPv6 addresses of the containers. Ideally there would be an internal port they could communicate on, but folks would likely encrypt anyway, so external is fine for now (but you need to make that bandwidth cheaper).
Thanks for the details. I’m glad there is a way to do this on Fly! As for leader identification, I’m leaning towards DNS CNAME for now as it reduces any dependency on the Raft implementation itself. Will give this a shot and report back.
That sounds like a great idea! I can’t count the number of times the metadata API has saved my ass.