Scaling not prioritizing primary region

I’m running a multi-region postgres app with multi-region graphql servers. Currently we’ve been running 1vm per region of each, however, sometimes when new builds go out, we don’t get any new servers in the primary region so it breaks our app since no app can write to the DB. Is there is a way to guarantee at least 1 server is on our primary region on each new deploy version? I don’t want to do max-per-region=1 because we will need to scale to multiple servers during high traffic periods, unless max-per-region=1 gurantees at least 1 per region before scaling.

Side note: we just moved our entire production servers and db from AWS to FLY and we’re all super happy about it :smiley:

Not sure but it may be worth checking your app’s regions (fly regions list) against the minimum number of vms you have set in your scale settings to get an even split (not like, 2 vms : 3 regions).

Only the docs say the vms should be balanced across the regions. So you’d need to set the right multiple based on number of vms and number of regions. That should ensure there is a vm in each region, including the primary one:

“If there are three regions in the pool and the count is set to six, there will be two app instances in each region.”

https://fly.io/docs/reference/scaling/#count-scaling

If you are using auto-scaling then this would be the relevant part of the docs. So again, you’d need the right initial min multiple e.g min of 3 if you had 3 regions to ensure 1 in each. For example:

Standard : Instances of the application, up to the minimum count, are evenly distributed among the regions in the pool. They are not relocated in response to traffic.”

https://fly.io/docs/reference/scaling/#autoscaling

Yup I followed both those examples. My region pool only has 4 regions, and I have a minimum of 4 vm’s, however, in my last deploy it add 2 vms in 2 regions and didn’t add add a vm from either of the other regions causing my app to crash, because I’m primary region didn’t have a VM allocation.

here is my region pool and autoscale plan.
Screen Shot 2022-04-14 at 10.09.03 AM
Screen Shot 2022-04-14 at 10.08.36 AM

and here are my current running servers

Strange … Yep that certainly looks wrong based on the evenly distributed part of the docs. Like, when I have three regions in the pool, and run fly scale count 3, I get 1 vm in each region, as expected. Not sure what I’ve done to achieve that.

You could try fly scale show out of interest. I’m not sure how that count interacts with the autoscale count behind the scenes, but out of interest does that also have a min of 4? Even if it does though, if they get distributed 2x2 not 1x4, that still wouldn’t help.

One for Fly, alas.

@kurt any ideas?

Unfortunately, Nomad makes it difficult to do things the way you’d want (which is also the way we want). You can’t autoscale AND spread things evenly around regions. You can manually fly scale count X --max-per-region=1 to make sure VMs are evenly distributed, but it won’t work with autoscaling.

1 Like

The problem with this is we can no longer deploy, since the regions are already at their max with a ‘bluegreen’ deploy strategy. Potentially a minimum to region would also be really helpful? We currently we have a lot of traffic in just one region, but we have no way to scale services just to that region while guaranteeing we have a server in our primary region. I personally feel this is a pretty serious limitation to fly multi-region db’s and servers in general if you can’t use auto-scaling, because there is no guarantee you’ll have a server be able to talk to a writer.

1 Like