`fly deploy` stuck on "Waiting for remote builder"

Hello!
Is your builder still reporting a large amount of usage?

If so, can you check out https://fly-metrics.net/ , select your builder app, and then expand the Volumes graph to see the usage reported there?

Thanks! I’m super curious about that growing usage. I may have some extra steps to take if that’s still going on and if it’s actually growing that large. In theory the volumes are capped at 50gb.

I seem to be having the same problem now too -

flyctl deploy
==> Verifying app config
→ Verified app config
==> Building image
Waiting for remote builder fly-builder-small-tree-3392… :earth_asia:

The cli stays stuck there while the monitoring page just repeats this over and over -

2022-10-01T17:10:59.479 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:10:59.479270915Z” level=debug msg=“checking docker activity”

2022-10-01T17:10:59.479 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:10:59.479535351Z” level=debug msg=“Calling GET /v1.41/containers/json?filters=%7B%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0”

2022-10-01T17:11:00.481 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:00.480801483Z” level=debug msg=“checking docker activity”

2022-10-01T17:11:00.481 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:00.481277836Z” level=debug msg=“Calling GET /v1.41/containers/json?filters=%7B%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0”

2022-10-01T17:11:01.484 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:01.483769067Z” level=debug msg=“checking docker activity”

2022-10-01T17:11:01.484 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:01.484057508Z” level=debug msg=“Calling GET /v1.41/containers/json?filters=%7B%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0”

2022-10-01T17:11:02.485 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:02.485343647Z” level=debug msg=“checking docker activity”

2022-10-01T17:11:02.486 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:02.485711067Z” level=debug msg=“Calling GET /v1.41/containers/json?filters=%7B%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0”

2022-10-01T17:11:03.487 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:03.487356510Z” level=debug msg=“checking docker activity”

2022-10-01T17:11:03.487 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:03.487661533Z” level=debug msg=“Calling GET /v1.41/containers/json?filters=%7B%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0”

2022-10-01T17:11:04.489 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:04.489411152Z” level=debug msg=“checking docker activity”

2022-10-01T17:11:04.490 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:04.489773021Z” level=debug msg=“Calling GET /v1.41/containers/json?filters=%7B%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0”

Also stuck for me, I can see that the builder is in the “Suspended” state from the Fly dashboard

Same here today, with the same logs that @kharri1073 was getting.

Earlier in the day I was having a somewhat similar problem when deploying a trivial Node app with a buildpack (it stalled after “downloaded newer image…” with the same logs); switching to a Dockerfile allowed me to deploy, but now not anymore.

Nothing interesting with LOG_LEVEL=debug. No errors reported when cancelling the build with Ctrl-C. I can destroy the builder but it doesn’t change matters. fly-metrics dashboard for the builder shows it idle – no IO, CPU < 1%.

Edit: On my side this has been using flyctl 0.0.405 & 0.0.406.

I’m seeing identical behavior to what @kharri1073 and @robjwells are describing as well. And as @hpx7 mentions, my builder is also consistently reverting to a “Suspended” state after it is created.

It was working ~12 hours ago, but now for some reason it hangs on Waiting for remote builder...

I’ve tried the following:

  • Reinstalled the CLI (fly v0.0.406 darwin/arm64 Commit: ba78bd6f BuildDate: 2022-10-07T23:28:22Z)
  • Destroyed and re-created my apps
  • Destroyed the builder

But no luck :confused:

For anybody with an app in yyz, there is an open issue. So that’s likely the cause for those remote builders having issues and deploys failing:

1 Like

Deploys with remote builders are working for me again (with flyctl 0.0.409).

What is interesting is that on my first attempt today I got this error message, which I didn’t before:

Error error connecting to docker: failed building options: agent: failed to start

The agent failed to start with the following error log:

2022/10/11 13:21:21.054597 srv another instance of the agent is already running

I killed the existing agent process (with kill -9) and restarted the agent with flyctl agent restart and now things appear to be fine. Maybe it was a problem with the agent all along?

I’m having this issue right now. Is there a known issue currently?

Hey,

I have a lot of symptoms that are described here.
fly deploy is stuck, builder in suspended state, but with a twist.

When accessing the “Machines” tab on the builder, the “Scale” tab appears. Desperate to find a solution, I tried to scale the builder to a dedicated cpu.
Since then the builder logs “Pulling container image” every 2s.
Deleting the builder and creating a new one (without rescaling it) doesn’t solve the problem.

Hope I didn’t break anything :laughing:

1 Like

Hey,

Deleting your fly-builder-xxxx app when it’s stuck usually does the trick since the next fly deploy will create a new one for you.

Unfortunately in that case it doesn’t.

I’m having the same problem.

Can you paste the result LOG_LEVEL=debug fly deploy here?

1 Like

Last query and mutation before hanging :

DEBUG --> POST https://api.fly.io/graphql

{
  "query": "query ($appName: String!) { appbasic:app(name: $appName) { id name platformVersion organization { id slug } } }",
  "variables": {
    "appName": "example-app"
  }
}

DEBUG {}
DEBUG <-- 200 https://api.fly.io/graphql (306.56ms)

{
  "data": {
    "appbasic": {
      "id": "example-app",
      "name": "example-app",
      "platformVersion": null,
      "organization": {
        "id": "example-org-id",
        "slug": "example-org-slug"
      }
    }
  }
}

DEBUG --> POST https://api.fly.io/graphql

{
  "query": "mutation($input: ValidateWireGuardPeersInput!) { validateWireGuardPeers(input: $input) { invalidPeerIps } }",
  "variables": {
    "input": {
      "peerIps": []
    }
  }
}

DEBUG {}
DEBUG <-- 200 https://api.fly.io/graphql (129.94ms)

{
  "data": {
    "validateWireGuardPeers": {
      "invalidPeerIps": []
    }
  }
}

EDIT

@rugwiro

After resetting wireguard with fly wireguard reset, the last mutation has a payload of two ipv6 in peerIps, the response is still the same, no invalid peer IP but now, after the last mutation I get a loop of :

DEBUG Remote builder unavailable, retrying in xms (err: Get "http://[<ipv6>]:2375/_ping": context deadline exceeded)
DEBUG Remote builder unavailable, retrying in xms (err: Get "http://[<ipv6>]:2375/_ping": context deadline exceeded)
DEBUG Remote builder unavailable, retrying in xms (err: Get "http://[<ipv6>]:2375/_ping": context deadline exceeded)

What’s fly doctor showing you?

1 Like

Same issue here, deleting builder app or even main app doesn’t help.
Only work around is to flyctl deploy --local-only

Had the same issue.
Somewhere in the thread i read killing the flyctl again should do the trick.

It worked :slight_smile:

ps -ef | grep flyctl
Kill the process of the flyctl agent and retry

3 Likes

OMG, thank you, I’ve struggled with this so often. Great simple fix!

Killing the process, machines and restarting everything didn’t work for me. What worked for me though is the following (and not sure which part fixed the issue, resetting or issuing)

Restarting Wireguard and reactivating it first manually and then typing the following commands
fly wireguard reset
flyctl ssh issue --agent
fly deploy