autostop machine - virtual machine exited abruptly

Hi there! I received an email notification stating that my apps will be upgraded to the new apps v2 machines next week. To ensure a smooth transition, I decided to upgrade one of my apps this week and verify that everything is working properly.

Unfortunately, I am currently facing an issue with the “auto_stop_machines” feature, which is supposed to downscale my machines when traffic is lower. Upon reviewing the logs, I noticed that Fly attempts to downscale one of my apps, sends the signal, but immediately logs “Virtual machine exited abruptly.” As a result, my graceful shutdown scripts are not being executed. It seems like the machine restarts immediately after the abrupt exit, I’m assuming it considers it as crashed. This cycle repeats, keeping my machines at a count of 2 instead of downsizing them to 1 as intended, and preventing me from benefiting of autoscaling them.

I have the kill_signal to “SIGINT” and the kill_timeout to “5” set in my config file. I’ve also experimented by removing the min_machines_running parameter or modifying the app’s regions, but Fly continues to fail in downgrading the machines.

flyctl machine stop <id> also abruptly exits the machine.

I’m unsure about the cause of this issue. Previously, everything was functioning correctly with Nomad, and my graceful shutdown scripts were executed without any issues. Local testing also works fine. Could there be an issue on Fly’s end? Or is it possible that there is a misconfiguration on my part?

1 Like

Taking a look at your app logs, it seems like your app isn’t gracefully shutting down within the timeout?

We first send a SIGINT, then 5 seconds later a SIGTERM and then 5 seconds after that we get the abruptly exited log. I’d experiment with making kill_timeout longer, say 20 seconds and see if that helps at all.

Hello! I adjusted the KILL_TIMEOUT value to 20 using the following notation:

KILL_TIMEOUT = 20

However, it didn’t seem to take effect. So, I attempted to display it as a string, like this:

KILL_TIMEOUT = "20s"

But it still doesn’t seem to apply the updated value and from what I can see in the logs only wait for 5 seconds (which I assume is the default?)

Per docs, should be like in your fly.toml as:

app = "your-app-name-here"
kill_timeout = 20

Sorry I wrote it in uppercase here, but it’s actually written in lowercase in my config. I initially had it as a number as per the docs and it didn’t work, the app would abruptly terminate after 5 seconds instead of 20 as defined, hence why I tried to write it as “20s”. I have just changed it back to a number in case there is any need for investigation as it still abruptly terminates 5 seconds after the signal is sent

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.