`fly deploy` stuck on "Waiting for remote builder"

Hello!
Is your builder still reporting a large amount of usage?

If so, can you check out https://fly-metrics.net/ , select your builder app, and then expand the Volumes graph to see the usage reported there?

Thanks! I’m super curious about that growing usage. I may have some extra steps to take if that’s still going on and if it’s actually growing that large. In theory the volumes are capped at 50gb.

I seem to be having the same problem now too -

flyctl deploy
==> Verifying app config
→ Verified app config
==> Building image
Waiting for remote builder fly-builder-small-tree-3392… :earth_asia:

The cli stays stuck there while the monitoring page just repeats this over and over -

2022-10-01T17:10:59.479 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:10:59.479270915Z” level=debug msg=“checking docker activity”

2022-10-01T17:10:59.479 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:10:59.479535351Z” level=debug msg=“Calling GET /v1.41/containers/json?filters=%7B%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0”

2022-10-01T17:11:00.481 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:00.480801483Z” level=debug msg=“checking docker activity”

2022-10-01T17:11:00.481 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:00.481277836Z” level=debug msg=“Calling GET /v1.41/containers/json?filters=%7B%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0”

2022-10-01T17:11:01.484 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:01.483769067Z” level=debug msg=“checking docker activity”

2022-10-01T17:11:01.484 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:01.484057508Z” level=debug msg=“Calling GET /v1.41/containers/json?filters=%7B%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0”

2022-10-01T17:11:02.485 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:02.485343647Z” level=debug msg=“checking docker activity”

2022-10-01T17:11:02.486 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:02.485711067Z” level=debug msg=“Calling GET /v1.41/containers/json?filters=%7B%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0”

2022-10-01T17:11:03.487 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:03.487356510Z” level=debug msg=“checking docker activity”

2022-10-01T17:11:03.487 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:03.487661533Z” level=debug msg=“Calling GET /v1.41/containers/json?filters=%7B%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0”

2022-10-01T17:11:04.489 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:04.489411152Z” level=debug msg=“checking docker activity”

2022-10-01T17:11:04.490 app[06e8262cXXXXXX] ewr [info] time=“2022-10-01T17:11:04.489773021Z” level=debug msg=“Calling GET /v1.41/containers/json?filters=%7B%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0”

Also stuck for me, I can see that the builder is in the “Suspended” state from the Fly dashboard

Same here today, with the same logs that @kharri1073 was getting.

Earlier in the day I was having a somewhat similar problem when deploying a trivial Node app with a buildpack (it stalled after “downloaded newer image…” with the same logs); switching to a Dockerfile allowed me to deploy, but now not anymore.

Nothing interesting with LOG_LEVEL=debug. No errors reported when cancelling the build with Ctrl-C. I can destroy the builder but it doesn’t change matters. fly-metrics dashboard for the builder shows it idle – no IO, CPU < 1%.

Edit: On my side this has been using flyctl 0.0.405 & 0.0.406.

I’m seeing identical behavior to what @kharri1073 and @robjwells are describing as well. And as @hpx7 mentions, my builder is also consistently reverting to a “Suspended” state after it is created.

It was working ~12 hours ago, but now for some reason it hangs on Waiting for remote builder...

I’ve tried the following:

  • Reinstalled the CLI (fly v0.0.406 darwin/arm64 Commit: ba78bd6f BuildDate: 2022-10-07T23:28:22Z)
  • Destroyed and re-created my apps
  • Destroyed the builder

But no luck :confused:

For anybody with an app in yyz, there is an open issue. So that’s likely the cause for those remote builders having issues and deploys failing:

1 Like

Deploys with remote builders are working for me again (with flyctl 0.0.409).

What is interesting is that on my first attempt today I got this error message, which I didn’t before:

Error error connecting to docker: failed building options: agent: failed to start

The agent failed to start with the following error log:

2022/10/11 13:21:21.054597 srv another instance of the agent is already running

I killed the existing agent process (with kill -9) and restarted the agent with flyctl agent restart and now things appear to be fine. Maybe it was a problem with the agent all along?

I’m having this issue right now. Is there a known issue currently?

Hey,

I have a lot of symptoms that are described here.
fly deploy is stuck, builder in suspended state, but with a twist.

When accessing the “Machines” tab on the builder, the “Scale” tab appears. Desperate to find a solution, I tried to scale the builder to a dedicated cpu.
Since then the builder logs “Pulling container image” every 2s.
Deleting the builder and creating a new one (without rescaling it) doesn’t solve the problem.

Hope I didn’t break anything :laughing:

1 Like

Hey,

Deleting your fly-builder-xxxx app when it’s stuck usually does the trick since the next fly deploy will create a new one for you.

Unfortunately in that case it doesn’t.

I’m having the same problem.

Can you paste the result LOG_LEVEL=debug fly deploy here?

1 Like

Last query and mutation before hanging :

DEBUG --> POST https://api.fly.io/graphql

{
  "query": "query ($appName: String!) { appbasic:app(name: $appName) { id name platformVersion organization { id slug } } }",
  "variables": {
    "appName": "example-app"
  }
}

DEBUG {}
DEBUG <-- 200 https://api.fly.io/graphql (306.56ms)

{
  "data": {
    "appbasic": {
      "id": "example-app",
      "name": "example-app",
      "platformVersion": null,
      "organization": {
        "id": "example-org-id",
        "slug": "example-org-slug"
      }
    }
  }
}

DEBUG --> POST https://api.fly.io/graphql

{
  "query": "mutation($input: ValidateWireGuardPeersInput!) { validateWireGuardPeers(input: $input) { invalidPeerIps } }",
  "variables": {
    "input": {
      "peerIps": []
    }
  }
}

DEBUG {}
DEBUG <-- 200 https://api.fly.io/graphql (129.94ms)

{
  "data": {
    "validateWireGuardPeers": {
      "invalidPeerIps": []
    }
  }
}

EDIT

@rugwiro

After resetting wireguard with fly wireguard reset, the last mutation has a payload of two ipv6 in peerIps, the response is still the same, no invalid peer IP but now, after the last mutation I get a loop of :

DEBUG Remote builder unavailable, retrying in xms (err: Get "http://[<ipv6>]:2375/_ping": context deadline exceeded)
DEBUG Remote builder unavailable, retrying in xms (err: Get "http://[<ipv6>]:2375/_ping": context deadline exceeded)
DEBUG Remote builder unavailable, retrying in xms (err: Get "http://[<ipv6>]:2375/_ping": context deadline exceeded)

What’s fly doctor showing you?

1 Like

Same issue here, deleting builder app or even main app doesn’t help.
Only work around is to flyctl deploy --local-only