Today I noticed a machine went into a failed state:
fly m list -a udns | grep "failed" 1781944a5642e8 udns-ams2 failed ams udns:xyz fdaa:0:35f3:a7b:a356:401b:858c:2 2023-02-09T23:26:22Z 2023-10-18T07:26:12Z v2
I see a bunch of “timed out connecting to your instance” which I also see for machines not in a “failed” state.
# is in started stated 2023-10-19T12:35:34Z proxy[0e286074a97686] syd [error]timed out while connecting to your instance. this indicates a problem with your app (hint: look at your logs and metrics) # is in failed state 2023-10-19T12:35:34Z proxy[1781944a5642e8] ams [error]timed out while connecting to your instance. this indicates a problem with your app (hint: look at your logs and metrics)
- When can machine (set to auto-start / auto-stop) enter “failed” state.
- Does it require manual intervention to recover from such a state? If so, what commands could revive it (
fly m remove+
fly m clone)?
fly m stop+
fly m startdoes not work; it errors out with:
Error: could not start machine 1781944a5642e8: failed to start VM 1781944a5642e8: aborted: could not reserve resource for machine: insufficient memory available to fulfill request
- Is there the traffic meant for this “failed” machine being dropped as errors or retried to another machine in the same or a nearer region?