Autoscale stopped my 1 machine and never started it again

https://feedyour.email is a fly app that had (until today) this config:

  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 0

Today at 2:41pm PT, Fly autoscaled the app to zero machines, and did not restart the machine for incoming requests. I eventually manually ran fly machine start and manually starting the machine worked.

Why isn’t the Fly proxy starting my stopped machine when requests are coming in for it?

3 Likes

I’m having this issue as well. Just started today.

Same error here

Same here. Also just started.

Same error! Why did this happen?

Same on region GRU

And on ORD.

Hi all, we’re deploying a fix now and things should be back working shortly.

1 Like

OK, we believe all stopped machines are seen by our proxy again and will be properly started when requested.

A summary of the events:

  • code was merged earlier in the morning (Central Time) around how we remove certain service data from our system when a machine stops
  • we deployed the changes later in the day which resulted in our proxy not seeing these stopped machines anymore
  • we received a few reports and started to reconcile the missing services while a fix was reviewed and deployed
  • we have a lot of integration test checks in place to verify behavior of stopped machines but didn’t have one to cover the data needed for our proxy to still see the machine to be able to start it back, additional test assertions are now in place to prevent the behavior from be affected again

Sorry for the issues you all may have experienced.

2 Likes

This is fixed here now, however just to let you know that due to the short stopped status I seem to now have a 2 cent charge due to the “stopped machines rootfs” changes upcoming. While not yet an issue as AFAICT this won’t be billed this month either way, I can imagine it might be an issue if this recurs.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.