[info] INFO Sending signal SIGINT to main child process w/ PID 323
…30 seconds
[info] INFO Sending signal SIGTERM to main child process w/ PID 323
[warn] Virtual machine exited abruptly
(I am trapping SIGTERM in my execution script and I guess that the actions there have not had enough time to land)
The behaviour I would have expected is
no SIGNT
SIGTERM
30 seconds
SIGKILL
am i missing something?
Documentation says:
kill_signal option
When shutting down a Fly Machine, by default, Fly.io sends a SIGINT signal to the running
process. Typically this triggers a hard shutdown option on most applications. The kill_signal
option allows you to change what signal is sent so that you can trigger a softer, less disruptive
shutdown. Options are SIGINT (default), SIGTERM, SIGQUIT, SIGUSR1, SIGUSR2, SIGKILL, or SIGSTOP. For example, to set the kill signal to SIGTERM, you would add:
kill_signal = "SIGTERM"
We are using fly machines restart to restart the machine.
Hm… I can reproduce this only with fly m restart—not with fly m stop.
(Maybe it’s this only-on-restarts aspect that is the new piece of information, relative to your post in July? I think it’s best if these are structured as a continuous flow of conversation, with everyone’s contributions magnifying the others’, rather than arriving scattershot.)
FROM debian:bookworm-slim
COPY --chmod=755 thirty /usr/local/bin/
CMD ["thirty"]
#!/bin/bash -eup
echo thirty
function l() { echo 30: "$1" 1>&2; sleep 0.1; }
trap 'l sigint' SIGINT
trap 'l sigterm; exit 0' SIGTERM
trap -p
while true; do sleep 0.1; done
And then, with fly m restart, the logs read…
22:59:59Z app[28*] ewr [info] INFO Sending signal SIGINT to main child process w/ PID 321
22:59:59Z app[28*] ewr [info]30: sigint
23:00:04Z app[28*] ewr [info] INFO Sending signal SIGTERM to main child process w/ PID 321
23:00:04Z app[28*] ewr [info]30: sigterm
23:00:05Z app[28*] ewr [info] INFO Main child exited normally with code: 0
Whereas fly m stop goes straight to SIGTERM…
23:01:36Z app[28*] ewr [info] INFO Sending signal SIGTERM to main child process w/ PID 323
23:01:36Z app[28*] ewr [info]30: sigterm
23:01:37Z app[28*] ewr [info] INFO Main child exited normally with code: 0
But these cases should really be the same. It’s hard to think of a reason why stop and restart would have different shutdown mechanisms…
Aside: The odd-looking sleep 0.1 in the l function avoids a distracting stderr to vsock zero copy err: Broken pipe, which I don’t think is related. (Others have reported it, as well.)
yeah, sorry, I couldn’t figure out how to resurrect the issue since it died after 7 days.
For our use case we don’t particular need fly m stop. We need fly m restart to use SIGTERM, and I think it’s expected that that would be the same. In our case using SIGINT precludes an important cleanup state and our operational deploys get messed up when we reconfigure our fly nodes.
Can you provide an App Name or Machine ID? I checked the logic involved with fly m restart and it does the following to determine what signal to send to the machine:
default to SIGINT (which you are seeing)
check if machine config has a stop signal configured (which you are not seeing but should)
fly machine restart doesn’t modify the configuration of existing machines. If you had previously deployed via fly deploy, then changed the fly.toml to have a specific kill_signal, you would need to fly deploy again in order to replace the existing machines with a new version containing the updated stop signal.
yep sorry, I meant fly machines deploy. I’ve triggered a redeploy. This is one of our UAT environment boxes, and we don’t have any activity scheduled for today so if you’d like to proactively test restarting on the instance please feel free to.