Machines constantly scaling up and down after enabling auto_stop_machines

containerops · July 10, 2023, 4:14pm

Unfortunately, I don’t have a lot of time right now to describe the issue, but it’s easy to reproduce. I can add more details later.

Configuration

[[services]]
  protocol = "tcp"
  internal_port = 8080
  auto_start_machines = true
  auto_stop_machines = true
  min_machines_running = 2

Formation

I deployed the app to three regions:

iad (primary) – 1 machine
lhr – 1 machine
gru – 1 machine

Issue

I’ve been performing a loading test for the last two weeks and noticed an issue with auto_stop_machines. The application is consistently receiving requests (~ 1000 requests per minute).

Constant restarts cause high tail latency.

Look what happens as soon as I set auto_stop_machines=false:

senyo · July 11, 2023, 4:00pm

Checking on our end, in your configuration, the primary region is gru, not iad. min_machines_running keeps the specified number of machines running only in your primary region, not globally. So, machines in lhr and iad will be stopped if the proxy sees there’s no traffic at to those machines at the time the downscaler runs. And since you only have 1 machine running in gru, it hasn’t been stopped at all.

However, the stopping and starting is an issue regardless. I’m looking into it at the moment. Am I right in saying the issue here is that machines are being stopped in the first place? That you’d expect that due to the consistent traffic, they’ll remain running?

containerops · July 11, 2023, 7:47pm

Thank you for looking into it.

Oh, I see. That is new for me.

Yeah, this is the issue. The machines are being stopped only to be started again in the next second.

containerops · July 11, 2023, 7:49pm

Same issue maybe?

containerops · July 11, 2023, 10:46pm

Yeah, I just checked the docs and it seems the machine should only stop if it has no traffic.

The current behaviour causes high tail latency / cold start issues as machines are restarted every minute or so.

auto_stop_machines: Whether to automatically stop an application’s machines when there’s excess capacity, per region. If there’s only one machine in a region, then the machine is stopped if it has no traffic. The Fly Proxy runs a process to automatically stop machines every few minutes. The default is true.

Fly Launch configuration (fly.toml) · Fly Docs

system · July 18, 2023, 10:46pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

ben-io · August 15, 2023, 9:06pm

@containerops we’ve fixed a bug where we were stopping machines when we shouldn’t have. Can you try enabling autostop to see if you still experience high tail latency?

hypeed · September 18, 2023, 3:59pm

Chiming in here… We’re still seeing this issue as of Sep 18. Doesn’t make sense that 148ed has traffic and then shuts itself down, only to restart itself one second later.

This is resulting in terrible HTTP response times:

Topic		Replies	Views
Machines being scaled to 0 even with `min_machines_running = 1`	5	839	July 20, 2023
Autoscaling auto_stop_machines not working appsv2	10	850	October 16, 2023
min_machines_running = 1 still scales the machine down to zero Questions / Help machines , autoscaling	14	153	January 25, 2025
Automatically starting/stopping Apps v2 instances Fresh Produce	50	8435	November 24, 2024
App keeps downscaling even though I've configured it not to. Questions / Help	3	670	November 11, 2023

Machines constantly scaling up and down after enabling auto_stop_machines

Configuration

Formation

Issue

Related topics