Today at 2:41pm PT, Fly autoscaled the app to zero machines, and did not restart the machine for incoming requests. I eventually manually ran fly machine start and manually starting the machine worked.
Why isn’t the Fly proxy starting my stopped machine when requests are coming in for it?
OK, we believe all stopped machines are seen by our proxy again and will be properly started when requested.
A summary of the events:
code was merged earlier in the morning (Central Time) around how we remove certain service data from our system when a machine stops
we deployed the changes later in the day which resulted in our proxy not seeing these stopped machines anymore
we received a few reports and started to reconcile the missing services while a fix was reviewed and deployed
we have a lot of integration test checks in place to verify behavior of stopped machines but didn’t have one to cover the data needed for our proxy to still see the machine to be able to start it back, additional test assertions are now in place to prevent the behavior from be affected again
Sorry for the issues you all may have experienced.
This is fixed here now, however just to let you know that due to the short stopped status I seem to now have a 2 cent charge due to the “stopped machines rootfs” changes upcoming. While not yet an issue as AFAICT this won’t be billed this month either way, I can imagine it might be an issue if this recurs.