One of the worker machines gets stopped once in a while

I have two worker machines running on fly.io and each of them runs on a performance-cpu-1x@4096MB VM. I don’t understand why one of them always stops after a certain amount of time.
It doesn’t look like it ran out of memory, so can anyone tell me why is it doing that?

2023-07-31T09:40:18.516 app[7811301c9077d8] ord [info] INFO Sending signal SIGINT to main child process w/ PID 256
2023-07-31T09:40:18.516 app[7811301c9077d8] ord [info] worker: Hitting Ctrl+C again will terminate all running tasks!
2023-07-31T09:40:18.517 app[7811301c9077d8] ord [info] worker: Warm shutdown (MainProcess)
2023-07-31T09:40:21.866 app[7811301c9077d8] ord [info] INFO Main child exited normally with code: 1
2023-07-31T09:40:21.866 app[7811301c9077d8] ord [info] INFO Starting clean up.
2023-07-31T09:40:21.867 app[7811301c9077d8] ord [info] WARN hallpass exited, pid: 257, status: signal: 15 (SIGTERM)
2023-07-31T09:40:21.872 app[7811301c9077d8] ord [info] 2023/07/31 09:40:21 listening on [fdaa:2:19c8:a7b:192:ccf2:5f16:2]:22 (DNS: [fdaa::3]:53)
2023-07-31T09:40:22.868 app[7811301c9077d8] ord [info] [ 2396.169135] reboot: Restarting system

Would you mind sharing your fly.toml with any sensitive/identifying information stripped?

Thanks.

Sure.

app = "backend"
primary_region = "ord"
kill_signal = "SIGINT"
kill_timeout = "5s"

[experimental]
  auto_rollback = true

[build]
  dockerfile = "Dockerfile"
  ignorefile = ".dockerignore"

[processes]
  web = "gunicorn --bind :8000 --workers 2 backend.wsgi"
  worker = "python -m celery -A backend worker -l info --concurrency=1"

[[services]]
  protocol = "tcp"
  internal_port = 8000
  processes = ["web"]

  [[services.ports]]
    port = 80
    handlers = ["http"]
    force_https = true

  [[services.ports]]
    port = 443
    handlers = ["tls", "http"]

  [services.concurrency]
    type = "connections"
    hard_limit = 25
    soft_limit = 20

  [[services.tcp_checks]]
    interval = "15s"
    timeout = "2s"
    grace_period = "1s"
    restart_limit = 0

Hey! Any updates on this?

Thanks

I believe since your VMs are on performance, Fly scales one of the VMs down when idled to prevent over usage/billing but to prevent your VMs from stopping I would suggest adding auto_stop_machines & auto_start_machines in your fly.toml and set both values to false.

Fly Docs: Automatically Stop and Start Machines

[[services]]
  internal_port = 8080
  protocol = "tcp"
  auto_stop_machines = false
  auto_start_machines = false

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.