Fly.toml configuration does not set kill_signal

iyonemoto · October 30, 2024, 8:51pm

Hi! I’ve set

kill_signal = “SIGTERM”
kill_timeout = 30

in Fly.toml – in my logs I’m seeing

[info] INFO Sending signal SIGINT to main child process w/ PID 323
…30 seconds
[info] INFO Sending signal SIGTERM to main child process w/ PID 323
[warn] Virtual machine exited abruptly

(I am trapping SIGTERM in my execution script and I guess that the actions there have not had enough time to land)

The behaviour I would have expected is

no SIGNT
SIGTERM
30 seconds
SIGKILL

am i missing something?

Documentation says:

kill_signal option

When shutting down a Fly Machine, by default, Fly.io sends a SIGINT signal to the running
process. Typically this triggers a hard shutdown option on most applications. The kill_signal
option allows you to change what signal is sent so that you can trigger a softer, less disruptive
shutdown. Options are SIGINT (default), SIGTERM, SIGQUIT, SIGUSR1, SIGUSR2,
SIGKILL, or SIGSTOP. For example, to set the kill signal to SIGTERM, you would add:
kill_signal = "SIGTERM"

We are using fly machines restart to restart the machine.

mayailurus · October 31, 2024, 11:31am

Hm… I can reproduce this only with fly m restart—not with fly m stop.

(Maybe it’s this only-on-restarts aspect that is the new piece of information, relative to your post in July? I think it’s best if these are structured as a continuous flow of conversation, with everyone’s contributions magnifying the others’, rather than arriving scattershot.)

app = "thirty"
primary_region = "ewr"
kill_signal = "SIGTERM"

[[restart]]
  policy = "no"

FROM debian:bookworm-slim

COPY --chmod=755 thirty /usr/local/bin/

CMD ["thirty"]

#!/bin/bash -eup

echo thirty

function l() { echo 30: "$1" 1>&2;  sleep 0.1; }

trap 'l sigint'           SIGINT
trap 'l sigterm;  exit 0' SIGTERM

trap -p

while true; do sleep 0.1; done

And then, with fly m restart, the logs read…

22:59:59Z app[28*] ewr [info] INFO Sending signal SIGINT to main child process w/ PID 321
22:59:59Z app[28*] ewr [info]30: sigint
23:00:04Z app[28*] ewr [info] INFO Sending signal SIGTERM to main child process w/ PID 321
23:00:04Z app[28*] ewr [info]30: sigterm
23:00:05Z app[28*] ewr [info] INFO Main child exited normally with code: 0

Whereas fly m stop goes straight to SIGTERM…

23:01:36Z app[28*] ewr [info] INFO Sending signal SIGTERM to main child process w/ PID 323
23:01:36Z app[28*] ewr [info]30: sigterm
23:01:37Z app[28*] ewr [info] INFO Main child exited normally with code: 0

But these cases should really be the same. It’s hard to think of a reason why stop and restart would have different shutdown mechanisms…

Aside: The odd-looking sleep 0.1 in the l function avoids a distracting stderr to vsock zero copy err: Broken pipe, which I don’t think is related. (Others have reported it, as well.)

mayailurus · October 31, 2024, 11:31am

Added duplicated, machines

iyonemoto · October 31, 2024, 4:45pm

yeah, sorry, I couldn’t figure out how to resurrect the issue since it died after 7 days.

For our use case we don’t particular need fly m stop. We need fly m restart to use SIGTERM, and I think it’s expected that that would be the same. In our case using SIGINT precludes an important cleanup state and our operational deploys get messed up when we reconfigure our fly nodes.

JP_Phillips · October 31, 2024, 4:52pm

Can you provide an App Name or Machine ID? I checked the logic involved with fly m restart and it does the following to determine what signal to send to the machine:

default to SIGINT (which you are seeing)
check if machine config has a stop signal configured (which you are not seeing but should)

iyonemoto · October 31, 2024, 5:10pm

olivaw-sandworm-2 / 908017eeb1e478

JP_Phillips · October 31, 2024, 5:12pm

Thanks!

The system has the following:

      "stop": {
        "timeout": "30s",
        "signal": "SIGNAL_SIGINT"
      }

which explains the behavior you are seeing.

Given you referenced a fly.toml, I assume these machines were created via fly deploy?

iyonemoto · October 31, 2024, 5:15pm

yes. This is the head of our fly.toml:

app = 'olivaw-sandworm-2'
primary_region = 'sjc'
kill_timeout = '30s'
kill_signal = "SIGTERM"

[build]
  ...

iyonemoto · October 31, 2024, 5:19pm

oh sorry we redeployed with the old configuration. I will fly machines restart with the new configuration.

JP_Phillips · October 31, 2024, 5:20pm

fly machine restart doesn’t modify the configuration of existing machines. If you had previously deployed via fly deploy, then changed the fly.toml to have a specific kill_signal, you would need to fly deploy again in order to replace the existing machines with a new version containing the updated stop signal.

iyonemoto · October 31, 2024, 5:54pm

yep sorry, I meant fly machines deploy. I’ve triggered a redeploy. This is one of our UAT environment boxes, and we don’t have any activity scheduled for today so if you’d like to proactively test restarting on the instance please feel free to.

JP_Phillips · October 31, 2024, 6:25pm

I checked the machine and it now has the correct configuration for stop:

      "stop": {
        "timeout": "30s",
        "signal": "SIGNAL_SIGTERM"
      }

system · November 7, 2024, 6:25pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fly CLI restart machine with custom signal not working Questions / Help	7	237	May 10, 2024
`fly machine restart --signal <SIG>` fixed Fresh Produce machines	0	124	May 3, 2024
fly.toml kill_signal and kill_timeout Questions / Help	3	148	August 14, 2024
kill_signal sent twice to the same machine at the same time Questions / Help machines	1	18	February 12, 2025
Is SIGINT an issue with my app or an issue with fly.io? Questions / Help	5	1487	May 21, 2025

Fly.toml configuration does not set kill_signal

Related topics