Request: Autoscale without killing all VMs to do so

This might be possible, but we are using the Balanced scaling strategy but find it strange that when Fly decides to reallocate resources based on traffic that every VM needs to be re-deployed.

Is there a way to avoid the need to have all VMs in an app to reboot and kill the Redis cache?

The way I see this working would be to only remove the least used VM and move it to the newly desired region without touching the other VMs in the pool.

This is actually a regression in Nomad (which we use to schedule these jobs). We can’t change balancing without restarting all the VMs. We throttled these changes down to once per hour to keep the disruption minimal, and have some workarounds planned for later.

Right now, the best bet if you don’t want to churn VMs like this is to turn balancing off, pick regions ahead of time, and let it scale.

Makes sense, would love this to be a feature in the near future, because I also think limiting it to 1 hour could also miss some advantages of the “edge” balancing being too late.

So the recommended configuration would be to move to the Standard scaling strategy with just a pool of regions where we want to always be available?

We are mostly US traffic, so being so, what would you suggest as the pool and min/max on the standard scale setting?

In the US I’d probably do lax/ord/ewr with a min_count of 3. That should get you really low latency from most places.

The restarts aren’t much of a problem, usually, so it’s fine to leave auto balancing enabled too!

Sounds good, I will give that a try!

Thanks so much!

@kurt - How does this look? We are still on Balancing Min=5, Max=25

Thanks!

I am still seeing hourly scale events with this setup, so we are still loosing cache on an hourly basis.

What is the best way to prevent this.

PS> If you look at the screen shot about you will notice there is 2 VMs in lax and no VM in iad.

Thanks,

Dan

That app is still “Balanced”. Did you run flyctl scale standard min=X?

If you’re trying to retain cache, you’re probably better off using Redis: https://fly.io/docs/reference/redis/

We have a hidden setting that will limit each region to one VM, if you switch it to standard I can enable that for you. It will occasionally cause deploy problems if your app needs more VMs than there are regions available but it works really well if you define backup regions.

Gotcha, I thought you meant I could leave it on Balanced if we did the number of regions in the pool as the same as the MIN.

I have now updated the scaling strategy to Standard.

We actually are using Redis and it is one of the reasons why we wanted to prevent the VMs from dropping so often in the thought that the Redis DB was local to each VM. Is this true for Database 0?

I just noticed there is a way to share the Redis DB across all the VMs by connecting to Database 1, this makes me think that the Redis DB is not local to each VM and there is a Redis DB per App?

Either way, it sounds like we need to start using Database 1 so that cache is updated for all VMs everywhere.

Thanks again!

The Redis cache is per region. It’ll survive between app instances. Database 1 is designed to push changes to other regions, mostly for things like purging keys though.

1 Like