One of the worker machines gets stopped once in a while

progremir · August 1, 2023, 5:41am

I have two worker machines running on fly.io and each of them runs on a performance-cpu-1x@4096MB VM. I don’t understand why one of them always stops after a certain amount of time.
It doesn’t look like it ran out of memory, so can anyone tell me why is it doing that?

2023-07-31T09:40:18.516 app[7811301c9077d8] ord [info] INFO Sending signal SIGINT to main child process w/ PID 256
2023-07-31T09:40:18.516 app[7811301c9077d8] ord [info] worker: Hitting Ctrl+C again will terminate all running tasks!
2023-07-31T09:40:18.517 app[7811301c9077d8] ord [info] worker: Warm shutdown (MainProcess)
2023-07-31T09:40:21.866 app[7811301c9077d8] ord [info] INFO Main child exited normally with code: 1
2023-07-31T09:40:21.866 app[7811301c9077d8] ord [info] INFO Starting clean up.
2023-07-31T09:40:21.867 app[7811301c9077d8] ord [info] WARN hallpass exited, pid: 257, status: signal: 15 (SIGTERM)
2023-07-31T09:40:21.872 app[7811301c9077d8] ord [info] 2023/07/31 09:40:21 listening on [fdaa:2:19c8:a7b:192:ccf2:5f16:2]:22 (DNS: [fdaa::3]:53)
2023-07-31T09:40:22.868 app[7811301c9077d8] ord [info] [ 2396.169135] reboot: Restarting system

nolan-fly · August 1, 2023, 2:41pm

Would you mind sharing your fly.toml with any sensitive/identifying information stripped?

Thanks.

progremir · August 2, 2023, 6:31am

Sure.

app = "backend"
primary_region = "ord"
kill_signal = "SIGINT"
kill_timeout = "5s"

[experimental]
  auto_rollback = true

[build]
  dockerfile = "Dockerfile"
  ignorefile = ".dockerignore"

[processes]
  web = "gunicorn --bind :8000 --workers 2 backend.wsgi"
  worker = "python -m celery -A backend worker -l info --concurrency=1"

[[services]]
  protocol = "tcp"
  internal_port = 8000
  processes = ["web"]

  [[services.ports]]
    port = 80
    handlers = ["http"]
    force_https = true

  [[services.ports]]
    port = 443
    handlers = ["tls", "http"]

  [services.concurrency]
    type = "connections"
    hard_limit = 25
    soft_limit = 20

  [[services.tcp_checks]]
    interval = "15s"
    timeout = "2s"
    grace_period = "1s"
    restart_limit = 0

progremir · August 9, 2023, 6:29am

Hey! Any updates on this?

Thanks

Christian_B · August 9, 2023, 7:07pm

I believe since your VMs are on performance, Fly scales one of the VMs down when idled to prevent over usage/billing but to prevent your VMs from stopping I would suggest adding auto_stop_machines & auto_start_machines in your fly.toml and set both values to false.

Fly Docs: Automatically Stop and Start Machines

[[services]]
  internal_port = 8080
  protocol = "tcp"
  auto_stop_machines = false
  auto_start_machines = false

system · August 16, 2023, 7:08pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
App keeps pausing	2	1212	July 29, 2023
Fly machine is suspended, below is the config Questions / Help	4	349	August 28, 2023
One of my apps both machines just stopped Questions / Help	7	67	January 24, 2025
Machines Keep Stopping with "Excess Capacity" Questions / Help machines , autoscaling	2	1848	July 12, 2024
Machines starting again after fly machine stop command Questions / Help	1	19	August 21, 2024

One of the worker machines gets stopped once in a while

Related topics