remote builder machine stuck in created state

My remote builder with machine id 9185517b45d983 (app fly-builder-green-pond-1760 ) is stuck in created state. It seems it is not able to pull the container image to get itself started? Logs -

2023-01-29T05:51:08.084 runner[9185517b45d983] sin [info] Pulling container image
2023-01-29T05:51:16.642 runner[9185517b45d983] sin [info] Pulling container image
2023-01-29T05:51:22.712 runner[9185517b45d983] sin [info] Pulling container image
2023-01-29T05:51:25.247 runner[9185517b45d983] sin [info] Pulling container image

I cannot stop or destroy this machine using the API either, as it is still in created state.

I saw a similar report on the forum here (Fly API error....). Nothing seems to be off in the status dashboard. Created a new thread as I thought the subject can be more indicative.

Hi,

Here too:

The sin region is certainly being mentioned sufficiently frequently to suggest an issue.

1 Like

Guessing this is the cause:

Perhaps. the builder is still stuck in created state though. And I can’t get remote builds done (on which I depend). Someone from Fly.io team please help!

I tried deleting the builder app itself (as mentioned in `fly deploy` stuck on "Waiting for remote builder" - #31 by rugwiro), which seemed to work. But the new builder that got created is also stuck with the same issue

2023-01-30T03:24:26.674 runner[1781031f9e9689] sin [info] Pulling container image
2023-01-30T03:24:28.306 runner[1781031f9e9689] sin [info] Pulling container image
2023-01-30T03:24:29.596 runner[1781031f9e9689] sin [info] Pulling container image

This time while running fly deploy I specified the region as maa but the builder still seems to be created in sin. Here’s the command I am using to trigger the remote build

fly deploy --region=maa --image-label v0.0.1 --build-only --push --remote-only -a <app_name>

output

==> Verifying app config
--> Verified app config
==> Building image
WARN Failed to start remote builder heartbeat: You hit a Fly API error with request ID: 01GR0B3S260GHKD4YJTFFM527X-maa
WARN Remote builder did not start in time. Check remote builder logs with `flyctl logs -a fly-builder-autumn-sea-3191`
Error failed to fetch an image or build from source: error connecting to docker: remote builder app unavailable
1 Like

@hi.kanily It turns out that you can’t set the builder’s region. They spawn in the region closest to you geographically. You also can’t set the app’s region because it’s tied to the region of the volume.

I think allowing region override for builders should be an option for cases like this.

Anyway, I came up with a hack you can try to workaround this:

  1. Trigger a build:

    fly deploy ... --region sjc --build-only
    

    Once it shows “WARN Failed to start remote builder heartbeat” or “waiting for remote builder…”, CTRL+C it. You only want the builder to spawn. (I’m using sjc region but you can use any, remember to replace it in the steps below if you change it!)

  2. Grab the builder app name:

    fly apps list
    

    The builder app starts with fly-builder-....

  3. Grab the builder app’s machine ID:

    fly machine list -a $BUILDER_APP_NAME
    

    There should only be one machine, in the sin region. Copy that ID.

  4. Clone the machine into a different region:

    fly machine clone $MACHINE_ID_FROM_ABOVE --region sjc -a $BUILDER_APP_NAME
    

    This will clone the machine and its volume into the sjc region.

  5. Verify that sjc is in the app’s region pool:

    fly regions list -a $BUILDER_APP_NAME
    Region Pool:
    sin
    sjc
    Backup Region:
    

    (Don’t bother with fly regions set .., it won’t go through because of the attached volume in sin.)

  6. Monitor the logs of your builder, it should hint that sjc is up and running:

    fly logs -a $BUILDER_APP_NAME
    2023-01-30T06:14:00Z app[732870d7be5585] sjc [info]time="2023-01-30T06:14:00.825682582Z" level=debug msg="checking docker activity"
    
  7. Trigger your build again, watch the logs. sjc should be the one performing the build. The sin builder will be stuck, but sjc should be proceeding normally.

To clean this up, fly apps destroy $BUILDER_APP_NAME will remove the apps/machines/volumes for you.

Hope this helps!

And just out of curiosity, why are you depending on remote-only builds?

edit: phrasing + question

3 Likes

ok so this hack worked. I cloned a builder in ams region and it came up just fine! so it does seem like the issue is with sin region only.

Someone from fly team can hopefully look into it, it has been failing since over 24 hours now :confused:

PS. Regarding the dependency on a remote builder… I am on an m1 mac, and building a dart codebase. Using buildx to build an amd64 binary does not work as of now (atleast for my project). Related issue here - Building linux/amd64 Docker Image on Mac M1 · Issue #48420 · dart-lang/sdk · GitHub

1 Like

Yay! Glad it worked.

The issue seems to stem from an upstream network; I’ve had poor connectivity to lots of services outside of Fly. I believe a fix is in progress (Fly status page).

PS. Regarding the dependency on a remote builder…

So you’re using Fly as a build server? That’s interesting.

1 Like

I wonder if this builder is charged, if left running…

1 Like

I’m trying that out now, will report back! It probably won’t be charged because the machine doesn’t seem to know that it’s a builder, but the app does, so the billing probably skips resources connected to a builder app.

More fundamentally, if builders are free and ssh-able, is there anything preventing abuse of builder-spawned machines?

1 Like

Observations:

  • Total usage shows $0 on a fresh org with only the builder app and 3 machines in the builder app provisioned (not running, though)
  • I kept some of the machines running after they stopped by kicking them with ssh. Billing shows “0 second Ă— VM”
  • I’m not billed for the volumes either (3x50GB)?
  • Machines always terminate within 10 minutes if there is no pending/ongoing Docker build, controlled by the builder [0]
  • I wonder if I can ping /extendDeadline [1] periodically to bypass the time limit

[0] https://github.com/superfly/rchab/blob/main/dockerproxy/main.go#L135-L140
[1] https://github.com/superfly/rchab/blob/main/dockerproxy/main.go#L89

1 Like