Orchestrate instance name for NATS clusters in a fly.io friendly way?

mwills · November 28, 2022, 5:40pm

I am running a lightly customized NATS super cluster. It’s great with NATS pub/sub. Jetstream persistence has a problem when instances go down and come back up. The current naming scheme is based on FLY_ALLOC_ID. As instances go down and come back up, they have a different name but NATS jetstream looks for the prior machine names to resync.

I haven’t used Terraform and it seems like overkill when what I really want, at this stage, is a registry to pull names from.

For now I think my simplest manual option is to use 1 node per region with nearby regions to form clusters, and manually set the cluster configuration.

Or perhaps something simple like Ben Johnson’s scale to zero machine example. Ben Johnson: Scale to Zero with Fly.io Machines

ignoramous · November 28, 2022, 6:49pm

A TXT nslookup on vms.<appname>.internal should get you a csv of alloc-ids assigned to <appname> (docs). Though, it may not always be current, it mostly should converge eventually to whatever’s the current state of the world.

Btw, one can assign volumes to VMs to make the alloc-ids stick across restarts / deploys, but it is a rather expensive way to do so. And: Machine VMs tend to be fixed on a single IPv6 (not sure about alloc-id, though). More here: Fly-Instance-Id header alternative for websockets - #2 by ignoramous

mwills · November 29, 2022, 2:05am

Thanks much! I’ll take a look there.

mwills · December 2, 2022, 2:36pm

@ignoramous is there any documentation about the sticky alloc-ids? That would work for me as these do have volumes assigned already. That would make things much easier. Might make it easier for their system as well.

ignoramous · December 3, 2022, 11:17am

Using volumes to make alloc-ids stick was called anchor scaling (can search for it in the forums as the docs are gone). I believe, in favour of Machines, which are kind of assigned sticky alloc-ids (I can’t say if it is incidental or an actual feature).

mwills · December 4, 2022, 6:09am

Thanks for that @ignoramous. I see Is it possible to make scaling more deterministic? but will ping support on current and future options.

mwills · December 7, 2022, 3:57am

@ignoramous Anchor scaling is still the way to go.

@kurt It turns out NATS Jetstream requires a consistent server_name. So booting up I check or set a server_name file on the volume, use that for the property, and .

What I don’t see yet is a true rolling deploy with the persisted volumes. With 9 nodes on 3 regions it began to stop 5, including all 3 of one region. I’ll start a new issue for that one.

Topic		Replies	Views
Peer networking among VM instances	7	982	August 1, 2020
Retrieve mounted volume ID within instance.	10	492	September 15, 2021
NATS cluster per-region rolling deploy? Questions / Help	1	427	December 28, 2022
Can an instance have a persistent network identity?	8	943	May 8, 2022
Is it possible to make scaling more deterministic? Questions / Help	6	966	February 25, 2022

Orchestrate instance name for NATS clusters in a fly.io friendly way?

Related topics