scale count 15 but eventually no instances running (503 error)

lautaropaske · December 16, 2022, 1:32am

Hi, it seems that there is an implicit restart policy triggering restart of my app’s instances very frequently. I currently have the following scale status:

VM Resources for sync-server
        VM Size: dedicated-cpu-1x
      VM Memory: 2 GB
          Count: 15
 Max Per Region: Not set

this is my fly-production.toml’s services section:

[[services]]
  http_checks = []
  internal_port = 8080
  processes = ["app"]
  protocol = "tcp"
  script_checks = []

  [services.concurrency]
    hard_limit = 200
    soft_limit = 170
    type = "connections"

  [[services.ports]]
    force_https = true
    handlers = ["http"]
    port = 80

  [[services.ports]]
    handlers = ["tls", "http"]
    port = 443

  [[services.tcp_checks]]
    grace_period = "5s"
    interval = "30s"
    restart_limit = 0
    timeout = "10s"

vm memory is not an issue (<10% at all times). This app is very slow (currently working on an update) so I guess some sort of http/tcp check might be triggering restarts.

What I see happening is that fly restarts the instance (not intended), and when it reaches a certain amount of restarts (some stopped at 4, other at 9), it kills the instance and provides a fresh one. This happens for all instances, all the time, and eventually fly stops providing fresh instances, reaching a point where I see the 503 error: “no instances to route to”.

kurt · December 16, 2022, 3:34pm

If you’re talking about the restarts that show in fly status, those are not always triggered by us. That counter means the app process exited and we started it back up.

The only time we do trigger restarts is if health checks fail repeatedly. You can disable that by adding restart_limit = 0 to the health check in services.

When there are multiple restarts in a specific interval, we replace the whole VM.

Is it possible the 10s timeout on the tcp check is too low?

If you run fly vm status <id> on one of those instances, you should be able to tell if the restart was because the process exited, or because the VM wasn’t healthy.

lautaropaske · December 16, 2022, 5:03pm

Hi Kurt, thanks for your answer. Yes, my app’s instances were restarting due to an unhandled exception caused by a specific timeout we set on our database which the app connects to. We’ll fix that.

Topic		Replies	Views
Instance or service not restarted when I expected it to Questions / Help	5	1150	July 26, 2022
No suitable (healthy) instance found to handle request	9	330	October 28, 2021
Unexpected Restarts metrics	3	753	September 17, 2020
App Shutting Down and won't restart?	4	779	June 27, 2022
Cause of instance restart unclear	14	1173	December 11, 2020

scale count 15 but eventually no instances running (503 error)

Related topics