Remote builder not waking up

When my remote builder suspends itself, it does not seem to wake up anymore, causing the following to happen:

❯ fly deploy --remote-only
==> Verifying app config
--> Verified app config
==> Building image
WARN Remote builder did not start in time. Check remote builder logs with `flyctl logs -a fly-builder-nameless-sky-4197`
Error failed to fetch an image or build from source: error connecting to docker: remote builder app unavailable

Relevant logs from the builder:

2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.300451261Z" level=info msg="Deadline reached without docker build"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.300563672Z" level=info msg="shutting down"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.301873959Z" level=info msg="gracefully stopped\n"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.301947557Z" level=debug msg="disk space used: 3.51%"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.301982081Z" level=info msg="Waiting for dockerd to exit"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.302391950Z" level=info msg="Processing signal 'interrupt'"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.302666996Z" level=debug msg="daemon configured with a 15 seconds minimum shutdown timeout"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.302695159Z" level=debug msg="start clean shutdown of all containers with a 15 seconds timeout..."
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.302967569Z" level=debug msg="found 0 orphan layers"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.303393749Z" level=debug msg="Unix socket /var/run/docker/libnetwork/d99d563e9087.sock doesn't exist. cannot accept client connections"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.303442570Z" level=debug msg="Cleaning up old mountid : start."
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.303565070Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.303677801Z" level=debug msg="Cleaning up old mountid : done."
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.303786555Z" level=debug msg="unmounting daemon root" mountpoint=/data/docker
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.304137633Z" level=debug msg="Clean shutdown succeeded"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.304170695Z" level=info msg="Daemon shutdown complete"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.304188739Z" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.304225148Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.304498099Z" level=debug msg="sd notification" error="<nil>" not
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.304498099Z" level=debug msg="sd notification" error="<nil>" notified=false state="STOPPING=1"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.478358314Z" level=debug msg="checking docker activity"
2022-09-27T11:46:37Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:37.478773994Z" level=debug msg="Calling GET /v1.41/containers/json?filters=%7B%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0"
2022-09-27T11:46:38Z app[6e82576a6d7787] ams [info]time="2022-09-27T11:46:38.305281454Z" level=warning msg="grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial unix:///var/run/docker/containerd/containerd.sock: timeout\". Reconnecting..." module=grpc
2022-09-27T11:46:41Z runner[6e82576a6d7787] ams [info]machine exited with exit code 0, not restarting

So far, the only fix that works is to completely remove the app and call fly deploy --remote-only again. This is the second time today that I had to do this.

Multiple reports of builder-related failures, today:

  1. Fly remote builder keeps going offline after a deploy
  2. Terraform trying to create an ip: You hit a Fly API error │ with request ID: │ 01GDYYP9A1DCCDMSY6ZBPTDE0X-lax
  3. flyctl deploy: You hit a Fly API error with request ID...

Likely that Fly engs resolve it once they figure out what’s up.

1 Like

Thanks! My issue seems to be the same as #1, strange I didn’t find that myself before posting. Anyway, since my report has the actual logs of the builder I figure it’s a good idea to keep this open?

Looking at the logs, I think the issue is a bit different.

I’m investigating, fine to keep this open! No problem there.

Can you give this another shot? We think that we’ve found and fixed the cause of recent issues with remote builders, so please let us know if you’re still running into this problem!

I actually just destroyed a build because it failed, but I’m not sure whether that was before or after your message, so I’ll keep my eyes open for when it happens again!

1 Like

I just had a deploy time out again, so I guess it’s not fixed after all :frowning_face: . What sucks is that I’m dealing with some issues on my production server that I need to fix ASAP, so having deploys time out really hurts. Is there a work around that I can try in the meantime? Is it safe to run fly deploy without --remote-only on GitHub Actions?

I built on gh-actions (ubuntu) for a long time (as it was the default until recently), and it worked nicely. I esp liked easy access to detailed build logs right in the gh-actions dashboard with the flyctl deploy --verbose --local-only switch.

I’ve had problems with the builder apps a couple of times. The fastest/easiest solution is to delete the builder app - either on the command line or on your org dashboard - and then run flyctl deploy again. It will create a new builder app. So far that’s worked for me every time.

What fly is doing on the backend to deploy an app has to be incredibly complex, and they push updates pretty frequently. I’m not surprised that now and then a builder app crashes.

1 Like

The better option may be to flyctl deploy --local-only <other-args> instead? Esp, since this is literally free on GitHub Actions (at least for open source projects).