Today I noticed a machine went into a failed state:
fly m list -a udns | grep "failed"
1781944a5642e8 udns-ams2 failed ams udns:xyz fdaa:0:35f3:a7b:a356:401b:858c:2 2023-02-09T23:26:22Z 2023-10-18T07:26:12Z v2
I see a bunch of “timed out connecting to your instance” which I also see for machines not in a “failed” state.
# is in started stated
2023-10-19T12:35:34Z proxy[0e286074a97686] syd [error]timed out while connecting to your instance. this indicates a problem with your app (hint: look at your logs and metrics)
# is in failed state
2023-10-19T12:35:34Z proxy[1781944a5642e8] ams [error]timed out while connecting to your instance. this indicates a problem with your app (hint: look at your logs and metrics)
Some questions:
- When can machine (set to auto-start / auto-stop) enter “failed” state.
- Does it require manual intervention to recover from such a state? If so, what commands could revive it (
fly m remove
+fly m clone
)?fly m stop
+fly m start
does not work; it errors out with:Error: could not start machine 1781944a5642e8: failed to start VM 1781944a5642e8: aborted: could not reserve resource for machine: insufficient memory available to fulfill request
- Is there the traffic meant for this “failed” machine being dropped as errors or retried to another machine in the same or a nearer region?