Fly-Builder can't connect to Docker

Attempting to use a remote builder via using flyctl deploy in my local project. The builder spins up but doesn’t seem to be able to connect to Docker. App name for my specific builder is fly-builder-sparkling-morning-7037

Seems like this could be related to Unable to deploy to any region - deployment forever loop

Logs:

2023-01-17T15:44:22.878 app[e2869ddc746686] ord [info] time="2023-01-17T15:44:22.877361083Z" level=debug msg="checking docker activity"

2023-01-17T15:44:22.878 app[e2869ddc746686] ord [info] time="2023-01-17T15:44:22.877676554Z" level=debug msg="Calling GET /v1.41/containers/json?filters=%7B%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0"

The machine eventually just dies - here’s what those logs look like from my last attempt (different app).

2023-01-17T14:32:33.408 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.407687276Z" level=debug msg="checking docker activity"

2023-01-17T14:32:33.408 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.407945941Z" level=debug msg="Calling GET /v1.41/containers/json?filters=%7B%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0"

2023-01-17T14:32:33.837 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.837274376Z" level=info msg="Deadline reached without docker build"

2023-01-17T14:32:33.837 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.837346982Z" level=info msg="shutting down"

2023-01-17T14:32:33.837 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.837510098Z" level=info msg="gracefully stopped\n"

2023-01-17T14:32:33.837 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.837542889Z" level=info msg="disk space used: 0.11%"

2023-01-17T14:32:33.837 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.837556575Z" level=info msg="Waiting for dockerd to exit"

2023-01-17T14:32:33.837 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.837727566Z" level=info msg="Processing signal 'interrupt'"

2023-01-17T14:32:33.838 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.838031967Z" level=debug msg="daemon configured with a 15 seconds minimum shutdown timeout"

2023-01-17T14:32:33.838 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.838058496Z" level=debug msg="start clean shutdown of all containers with a 15 seconds timeout..."

2023-01-17T14:32:33.838 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.838153595Z" level=debug msg="found 0 orphan layers"

2023-01-17T14:32:33.838 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.838672558Z" level=debug msg="Unix socket /var/run/docker/libnetwork/f4f8fce416f6.sock doesn't exist. cannot accept client connections"

2023-01-17T14:32:33.838 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.838730597Z" level=debug msg="Cleaning up old mountid : start."

2023-01-17T14:32:33.839 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.838849851Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby

2023-01-17T14:32:33.839 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.838919772Z" level=debug msg="Cleaning up old mountid : done."

2023-01-17T14:32:33.839 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.839012766Z" level=debug msg="unmounting daemon root" mountpoint=/data/docker

2023-01-17T14:32:33.839 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.839143321Z" level=debug msg="Clean shutdown succeeded"

2023-01-17T14:32:33.839 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.839164150Z" level=info msg="Daemon shutdown complete"

2023-01-17T14:32:33.839 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.839229773Z" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd

2023-01-17T14:32:33.839 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.839332075Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby

2023-01-17T14:32:33.839 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.839392027Z" level=debug msg="received signal" signal=terminated

2023-01-17T14:32:33.839 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:33.839524897Z" level=debug msg="sd notification" error="<nil>" notified=false state="STOPPING=1"

2023-01-17T14:32:34.410 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:34.410013401Z" level=debug msg="checking docker activity"

2023-01-17T14:32:34.410 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:34.410276314Z" level=debug msg="Calling GET /v1.41/containers/json?filters=%7B%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0"

2023-01-17T14:32:34.840 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:34.840110999Z" level=warning msg="grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial unix:///var/run/docker/containerd/containerd.sock: timeout\". Reconnecting..." module=grpc

2023-01-17T14:32:34.842 app[1781e33a504e89] ord [info] time="2023-01-17T14:32:34.841996415Z" level=info msg="dockerd has exited"

2023-01-17T14:32:37.419 runner[1781e33a504e89] ord [info] machine exited with exit code 0, not restarting
1 Like

Seems like it couldn’t connect to a Unix Domain socket, presumably where containerd is supposed to be listening…

Either build with flyctl deploy --local-only <other-args> or destroy the current builder (which should create a new one the next time you issue a flyctl deploy):

flyctl apps destroy -y fly-builder-sparkling-morning-7037

Thanks - Yeah, I tried that. I am running into a weird segfault when I attempt to build locally using fly deploy --local-only. The docker image builds fine when I use docker build ...

Still looking for the remote builders to work - just tried this again and they are hanging

Usually this means flyctl can’t talk to your builder properly. Try running fly doctor -a <app> and see if it recommends anything.

Sometimes the fix is to run fly wg reset <org>. This can also occasionally happen when other VPNs are running on the host. And we’ve seen similar issues with Little Snitch, on occasion.

1 Like

I was able to resolve the issue.

I had wireguard tunnels open to another organization. I cycled through and removed all tunnels with fly wg remove <org> <tunnel>. After removing all of the tunnels, I re-ran fly doctor -a <app> and the command just hung. I went into Activity Monitor and force-killed the Fly Agent. I then started the fly agent again with fly agent start.

After cleaning up all the tunnels and rebooting the fly agent, I was able to run fly doctor -a <app> successfully. Then everything worked again.

EDIT: This happened again this evening, except there were no wireguard tunnels set up at all (see removal above). Turns out the Fly Agent was hanging. I had to force quit the fly agent via Activity Monitor, restart it, and things are fine now.

2 Likes