Machine auto stop/start flapping

This seems to come up quite a bit in other issues (eg Autoscaling auto_stop_machines not working). It seems in most-but-not-all cases the proxy stops a machine and it immediately starts it back up again even when we’re way under the soft limit on all machines. This does seem to correct itself but it takes hours and many restarts to reach the expected number. This is what this looks like:

(there’s 7 machines on the list and on 5 on the right doing the flapping, should be 3).

Relevant config bits:

  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 3

    type = "requests"
    hard_limit = 4000
    soft_limit = 3000

And the non-app machine logs:

2023-10-16T10:10:55Z app[xxx] fra [info] INFO Sending signal SIGINT to main child process w/ PID 306
2023-10-16T10:11:05Z proxy[xxx] fra [error]could not complete HTTP request to instance: connection error: Connection reset by peer (os error 104)
2023-10-16T10:11:05Z proxy[xxx] fra [error]instance refused connection. is your app listening on make sure it is not only listening on (hint: look at your startup logs, servers often print the address they are listening on)
2023-10-16T10:11:05Z health[xxx] fra [error]Health check on port 8000 has failed. Your app is not responding properly. Services exposed on ports [80, 443] will have intermittent failures until the health check passes.
2023-10-16T10:11:35Z app[xxx] fra [info] INFO Main child exited normally with code: 0
2023-10-16T10:11:35Z app[xxx] fra [info] INFO Starting clean up.
2023-10-16T10:11:35Z app[xxx] fra [info] WARN hallpass exited, pid: 307, status: signal: 15 (SIGTERM)
2023-10-16T10:11:35Z app[xxx] fra [info]2023/10/16 10:11:35 listening on [xxx]:22 (DNS: [fdaa::3]:53)
2023-10-16T10:11:36Z app[xxx] fra [info][  152.495883] reboot: Restarting system
2023-10-16T10:11:48Z proxy[xxx] fra [info]Starting machine
2023-10-16T10:11:48Z app[xxx] fra [info][    0.056200] PCI: Fatal: No config space access function found
2023-10-16T10:11:49Z app[xxx] fra [info] INFO Starting init (commit: 15238e9)...
2023-10-16T10:11:49Z app[xxx] fra [info] INFO [fly api proxy] listening at /.fly/api
2023-10-16T10:11:49Z app[xxx] fra [info]2023/10/16 10:11:49 listening on [xxx]:22 (DNS: [fdaa::3]:53)
2023-10-16T10:11:49Z proxy[xxx] fra [info]machine started in 459.553239ms
2023-10-16T10:11:49Z proxy[xxx] fra [info]machine became reachable in 20.337335ms
2023-10-16T10:11:51Z health[xxx] fra [info]Health check on port 8000 is now passing.

Confirming that the number of machines has now correctly stabilized at 3 after many start/stop rounds.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.