Hello, we’re having trouble with termination during deploy (bluegreen). We’re running an elixir application and have configured the following in fly.toml:
app = "selective"
primary_region = "sea"
kill_signal = "SIGTERM"
kill_timeout = 300
[deploy]
release_command = "/app/bin/migrate"
strategy = "bluegreen"
[env]
PHX_HOST = "app.selective.ci"
PORT = "8080"
RELEASE_COOKIE = "..."
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = true
auto_start_machines = true
[http_service.concurrency]
type = "connections"
hard_limit = 1000
soft_limit = 1000
[checks]
[checks.selective_health_check]
grace_period = "10s"
interval = "10s"
method = "get"
path = "/health-check"
port = 8080
timeout = "2s"
type = "http"
However, we’re not experiencing what we expect with the kill
settings. Our desire would is for SIGTERM to be issued, then fly should wait 5 minutes before killing the VM. Here’s a snippet from our log during a deploy:
2023-09-30T16:31:46Z app[e2865749b75478] sea [info] INFO Sending signal SIGTERM to main child process w/ PID 303
2023-09-30T16:31:46Z app[e2865749b75478] sea [info]16:31:46.412 [notice] SIGTERM received - shutting down
2023-09-30T16:31:46Z app[e2865749b75478] sea [info] INFO Sending signal SIGTERM to main child process w/ PID 303
2023-09-30T16:31:46Z app[e2865749b75478] sea [info]16:31:46.512 [notice] SIGTERM received - shutting down
... other application log entries ...
2023-09-30T16:31:51Z app[e2865749b75478] sea [warn]Virtual machine exited abruptly
See that SIGTERM is issued twice to the same machine at the same exact time, then 5 seconds later the virtual machine exited abruptly. Our application involves long running web-socket connections and we need to provide time for them to finish what they’re doing and reconnect elsewhere (usually 10 - 15 seconds).
Thanks in advance for your help!