When my remote builder suspends itself, it does not seem to wake up anymore, causing the following to happen:
❯ fly deploy --remote-only
==> Verifying app config
--> Verified app config
==> Building image
WARN Remote builder did not start in time. Check remote builder logs with `flyctl logs -a fly-builder-nameless-sky-4197`
Error failed to fetch an image or build from source: error connecting to docker: remote builder app unavailable
Relevant logs from the builder:
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.300451261Z" level=info msg="Deadline reached without docker build"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.300563672Z" level=info msg="shutting down"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.301873959Z" level=info msg="gracefully stopped\n"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.301947557Z" level=debug msg="disk space used: 3.51%"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.301982081Z" level=info msg="Waiting for dockerd to exit"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.302391950Z" level=info msg="Processing signal 'interrupt'"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.302666996Z" level=debug msg="daemon configured with a 15 seconds minimum shutdown timeout"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.302695159Z" level=debug msg="start clean shutdown of all containers with a 15 seconds timeout..."
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.302967569Z" level=debug msg="found 0 orphan layers"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.303393749Z" level=debug msg="Unix socket /var/run/docker/libnetwork/d99d563e9087.sock doesn't exist. cannot accept client connections"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.303442570Z" level=debug msg="Cleaning up old mountid : start."
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.303565070Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.303677801Z" level=debug msg="Cleaning up old mountid : done."
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.303786555Z" level=debug msg="unmounting daemon root" mountpoint=/data/docker
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.304137633Z" level=debug msg="Clean shutdown succeeded"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.304170695Z" level=info msg="Daemon shutdown complete"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.304188739Z" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.304225148Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.304498099Z" level=debug msg="sd notification" error="<nil>" not
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.304498099Z" level=debug msg="sd notification" error="<nil>" notified=false state="STOPPING=1"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.478358314Z" level=debug msg="checking docker activity"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.478773994Z" level=debug msg="Calling GET /v1.41/containers/json?filters=%7B%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0"
2022-09-27T11:46:38Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:38.305281454Z" level=warning msg="grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial unix:///var/run/docker/containerd/containerd.sock: timeout\". Reconnecting..." module=grpc
2022-09-27T11:46:41Z runner[6e82576a6d7787] ams [info]machine exited with exit code 0, not restarting
So far, the only fix that works is to completely remove the app and call fly deploy --remote-only
again. This is the second time today that I had to do this.