Scaling not prioritizing primary region

Charlie_Blackstock · April 14, 2022, 3:28pm

I’m running a multi-region postgres app with multi-region graphql servers. Currently we’ve been running 1vm per region of each, however, sometimes when new builds go out, we don’t get any new servers in the primary region so it breaks our app since no app can write to the DB. Is there is a way to guarantee at least 1 server is on our primary region on each new deploy version? I don’t want to do max-per-region=1 because we will need to scale to multiple servers during high traffic periods, unless max-per-region=1 gurantees at least 1 per region before scaling.

Side note: we just moved our entire production servers and db from AWS to FLY and we’re all super happy about it

greg · April 14, 2022, 3:58pm

Not sure but it may be worth checking your app’s regions (fly regions list) against the minimum number of vms you have set in your scale settings to get an even split (not like, 2 vms : 3 regions).

Only the docs say the vms should be balanced across the regions. So you’d need to set the right multiple based on number of vms and number of regions. That should ensure there is a vm in each region, including the primary one:

“If there are three regions in the pool and the count is set to six, there will be two app instances in each region.”

https://fly.io/docs/reference/scaling/#count-scaling

If you are using auto-scaling then this would be the relevant part of the docs. So again, you’d need the right initial min multiple e.g min of 3 if you had 3 regions to ensure 1 in each. For example:

“Standard : Instances of the application, up to the minimum count, are evenly distributed among the regions in the pool. They are not relocated in response to traffic.”

https://fly.io/docs/reference/scaling/#autoscaling

Charlie_Blackstock · April 14, 2022, 5:07pm

Yup I followed both those examples. My region pool only has 4 regions, and I have a minimum of 4 vm’s, however, in my last deploy it add 2 vms in 2 regions and didn’t add add a vm from either of the other regions causing my app to crash, because I’m primary region didn’t have a VM allocation.

Charlie_Blackstock · April 14, 2022, 5:09pm

here is my region pool and autoscale plan.

Screen Shot 2022-04-14 at 10.08.36 AM

Charlie_Blackstock · April 14, 2022, 5:12pm

and here are my current running servers

greg · April 14, 2022, 6:09pm

Strange … Yep that certainly looks wrong based on the evenly distributed part of the docs. Like, when I have three regions in the pool, and run fly scale count 3, I get 1 vm in each region, as expected. Not sure what I’ve done to achieve that.

You could try fly scale show out of interest. I’m not sure how that count interacts with the autoscale count behind the scenes, but out of interest does that also have a min of 4? Even if it does though, if they get distributed 2x2 not 1x4, that still wouldn’t help.

One for Fly, alas.

Charlie_Blackstock · April 14, 2022, 9:31pm

@kurt any ideas?

kurt · April 14, 2022, 11:54pm

Unfortunately, Nomad makes it difficult to do things the way you’d want (which is also the way we want). You can’t autoscale AND spread things evenly around regions. You can manually fly scale count X --max-per-region=1 to make sure VMs are evenly distributed, but it won’t work with autoscaling.

Charlie_Blackstock · April 15, 2022, 1:57pm

The problem with this is we can no longer deploy, since the regions are already at their max with a ‘bluegreen’ deploy strategy. Potentially a minimum to region would also be really helpful? We currently we have a lot of traffic in just one region, but we have no way to scale services just to that region while guaranteeing we have a server in our primary region. I personally feel this is a pretty serious limitation to fly multi-region db’s and servers in general if you can’t use auto-scaling, because there is no guarantee you’ll have a server be able to talk to a writer.

Topic		Replies	Views
VM shutdowns and another one is never recreated docs	2	308	April 30, 2022
How to guarantee even distribution of VMs across regions? Questions / Help	2	390	October 23, 2022
scale or autoscale doesn't keep instances in the primary region	4	603	December 29, 2022
Uneven region distribution with count based scaling elixir	4	406	August 18, 2022
Request: Autoscale without killing all VMs to do so	9	567	August 27, 2020

Scaling not prioritizing primary region

Related topics