Failed to start remote builder heartbeat: server returned a non-200 status code: 500

Same issue, golang, cdg.

$ fly logs -a fly-builder-divine-sun-2725

Waiting for logs...

2023-10-14T09:13:33.359 runner[908055ec3dd748] cdg [info] Pulling container image docker-hub-mirror.fly.io/flyio/rchab:sha-a4467b8
2023-10-14T09:14:58.585 runner[908055ec3dd748] cdg [info] Pulling container image docker-hub-mirror.fly.io/flyio/rchab:sha-a4467b8
2023-10-14T09:16:23.474 runner[908055ec3dd748] cdg [info] Pulling container image docker-hub-mirror.fly.io/flyio/rchab:sha-a4467b8

Iā€™m seeing the same issue. Canā€™t deploy for a few days now already. I think it worked for a short while in-between, but now itā€™s broken again.

I also tried deleting the builder now, but that didnā€™t help.

Same here lhr region.

:wave:

This incident status has been updated; deployments should be working now.

2 Likes

Can confirm, it works on my end now, thanks a lot! :heart:

1 Like

(heh, all I did today was post on the forum but Iā€™ll pass it on)

I posted this on another thread, but just in hopes to get some traction + some help, I would like to also mention it hereā€¦
Iā€™m still having trouble with deploying my phoenix app w/ postgres.
I used to get the same 503 error, but now itā€™s constantly timing out.

 āœ– Failed: error waiting for release_command machine e2865996a11686 to start: timeout reached waiting for machine to started failed to wait for VM e2865996a11686 in started state.
-------
Error: release command failed - aborting deployment. error waiting for release_command machine e2865996a11686 to start: timeout reached waiting for machine to started failed to wait for VM e2865996a11686 in started state: Get "https://api.machines.dev/v1/apps/ekmi/machines/e2865996a11686/wait?instance_id=01HCTWP7CEKY6CC26RYDBTXGPY&state=started&timeout=60": net/http: request canceled
You can increase the timeout with the --wait-timeout flag

My deployment server is Hong Kong.

any ideas?

Iā€™m having this issue as well. Simple Flask app with a local sqlite db.

DEBUG {}
DEBUG ā† 500 GraphQL Playground (653.42ms)

DEBUG {
ā€œerrorsā€: [
{
ā€œmessageā€: ā€œYou hit a Fly API error with request ID: 01HCWPTN64JEDRCMTJQXWVZRX0-yyzā€,
ā€œextensionsā€: {
ā€œcodeā€: ā€œSERVER_ERRORā€,
ā€œfly_request_idā€: ā€œ01HCWPTN64JEDRCMTJQXWVZRX0-yyzā€
}
}
],
ā€œdataā€: {}
}

I selected Chicago but errors I get are for YYZ (first try was with YYZ, then I deleted everything and specified Chicago as the region to see if that would work with my fly launch this time around.)

Iā€™d been having this issue on my NodeJS server since roughly when this thread was created, but deleting the builder and then waiting a bit for the new builder to provision solved it for me.

You ran into the same issue as Unable to deploy due to builder 500 error, it is fixed now.

We have tried to deploy just now and itā€™s failing with same error.

Same here, again since yesterday. Django app, server in Singapore (sin).

Same here, AMS. Since yesterday.

Iā€™ve been having the same problem in LHR for the last few days. Tried destroying the builder with fly app destroy fly-builder-foobar.

LOG_LEVEL=debug flyctl deploy --build-arg ********=****************************** --remote-only
DEBUG Loaded flyctl config from******************************
DEBUG determined hostname: "********"
DEBUG determined working directory: "******************************"
DEBUG determined user home directory: "******************************"
DEBUG determined config directory: "******************************"
DEBUG ensured config directory exists.
DEBUG ensured config directory perms.
DEBUG cache loaded.
DEBUG config initialized.
DEBUG skipped querying for new release
DEBUG client initialized.
DEBUG app config loaded from ******************************/fly.toml
DEBUG --> POST https://api.fly.io/graphql

DEBUG {
  "query": "query ($appName: String!) { appbasic:app(name: $appName) { id name platformVersion organization { id slug paidPlan } } }",
  "variables": {
    "appName": "********"
  }
}


DEBUG {}
DEBUG <-- 200 https://api.fly.io/graphql (1.54s)

DEBUG {
  "data": {
    "appbasic": {
      "id": "********",
      "name": "********",
      "platformVersion": "machines",
      "organization": {
        "id": "VJeZXwpBobe7OTv1Xk80P6a5V0CoVZN7",
        "slug": "********",
        "paidPlan": false
      }
    }
  }
}

==> Verifying app config
Validating ******************************/fly.toml
Platform: machines
āœ“ Configuration is valid
--> Verified app config
DEBUG Starting task manager
DEBUG Config has metrics token

DEBUG --> POST https://api.fly.io/graphql

DEBUG {
  "query": "query ($appName: String!) { appcompact:app(name: $appName) { id name hostname deployed status appUrl platformVersion organization { id slug paidPlan } postgresAppRole: role { name } imageDetails { repository version } } }",
  "variables": {
    "appName": "********"
  }
}


DEBUG {}
DEBUG failed to connect metrics websocket: websocket.Dial wss://flyctl-metrics.fly.dev/socket: dial tcp [2a09:8280:1::1c:3475]:443: connect: network is unreachable

DEBUG <-- 200 https://api.fly.io/graphql (646.74ms)

DEBUG {
  "data": {
    "appcompact": {
      "id": "********",
      "name": "********",
      "hostname": "********.fly.dev",
      "deployed": true,
      "status": "deployed",
      "appUrl": "https://2a09:8280:1::f:851",
      "platformVersion": "machines",
      "organization": {
        "id": "VJeZXwpBobe7OTv1Xk80P6a5V0CoVZN7",
        "slug": "********",
        "paidPlan": false
      },
      "postgresAppRole": null,
      "imageDetails": {
        "repository": "unknown",
        "version": "unknown"
      }
    }
  }
}

WARN ******** may be a potentially sensitive environment variable. Consider setting it as a secret, and removing it from the [env] section: https://fly.io/docs/reference/secrets/

==> Building image
DEBUG trying remote docker daemon
DEBUG --> POST https://api.fly.io/graphql

DEBUG {
  "query": "mutation($input: EnsureMachineRemoteBuilderInput!) { ensureMachineRemoteBuilder(input: $input) { machine { id state ips { nodes { family kind ip } } }, app { name organization { id slug } } } }",
  "variables": {
    "input": {
      "appName": "********",
      "organizationId": null
    }
  }
}


DEBUG {}
DEBUG <-- 500 https://api.fly.io/graphql (661.58ms)

DEBUG {
  "errors": [
    {
      "message": "You hit a Fly API error with request ID: 01HD0ZZEADKQRHPQ1EKY125C74-mel",
      "extensions": {
        "code": "SERVER_ERROR",
        "fly_request_id": "01HD0ZZEADKQRHPQ1EKY125C74-mel"
      }
    }
  ],
  "data": {}
}

WARN Failed to start remote builder heartbeat: server returned a non-200 status code: 500

DEBUG Config has metrics token

DEBUG --> POST https://api.fly.io/graphql

DEBUG {
  "query": "\n# @genqlient\nmutation ResolverCreateBuild ($input: CreateBuildInput!) {\n\tcreateBuild(input: $input) {\n\t\tid\n\t\tstatus\n\t}\n}\n",
  "variables": {
    "input": {
      "appName": "********",
      "builderType": "remote",
      "clientMutationId": "",
      "imageOpts": {
        "buildArgs": {
          "********": "******************************"
        },
        "buildPacks": null,
        "builder": "",
        "builtIn": "",
        "builtInSettings": null,
        "dockerfilePath": "",
        "extraBuildArgs": null,
        "imageLabel": "",
        "imageRef": "",
        "noCache": false,
        "publish": true,
        "tag": "registry.fly.io/********:deployment-01HD0ZZET9QCX3YGX8G8WVGWZW",
        "target": ""
      },
      "machineId": "",
      "strategiesAvailable": [
        "Buildpacks",
        "Dockerfile",
        "Builtin"
      ]
    }
  },
  "operationName": "ResolverCreateBuild"
}

DEBUG {0xc000c08a50}
DEBUG failed to connect metrics websocket: websocket.Dial wss://flyctl-metrics.fly.dev/socket: dial tcp [2a09:8280:1::1c:3475]:443: connect: network is unreachable

DEBUG Config has metrics token

DEBUG failed to connect metrics websocket: websocket.Dial wss://flyctl-metrics.fly.dev/socket: dial tcp [2a09:8280:1::1c:3475]:443: connect: network is unreachable

DEBUG <-- 200 https://api.fly.io/graphql (572.74ms)

DEBUG {
  "data": {
    "createBuild": {
      "id": "3925677",
      "status": "started"
    }
  }
}

DEBUG Trying 'Buildpacks' strategy

DEBUG no buildpack builder configured, skipping
DEBUG result image:<nil> error:<nil>

DEBUG Trying 'Dockerfile' strategy

DEBUG --> POST https://api.fly.io/graphql

DEBUG {
  "query": "mutation($input: EnsureMachineRemoteBuilderInput!) { ensureMachineRemoteBuilder(input: $input) { machine { id state ips { nodes { family kind ip } } }, app { name organization { id slug } } } }",
  "variables": {
    "input": {
      "appName": "********",
      "organizationId": null
    }
  }
}


DEBUG {}
DEBUG <-- 500 https://api.fly.io/graphql (633.05ms)

DEBUG {
  "errors": [
    {
      "message": "You hit a Fly API error with request ID: 01HD0ZZFH4QT8D34CH44GBV2W5-mel",
      "extensions": {
        "code": "SERVER_ERROR",
        "fly_request_id": "01HD0ZZFH4QT8D34CH44GBV2W5-mel"
      }
    }
  ],
  "data": {}
}

DEBUG result image:<nil> error:error connecting to docker: server returned a non-200 status code: 500

DEBUG Config has metrics token

DEBUG --> POST https://api.fly.io/graphql

DEBUG {
  "query": "\n# @genqlient\nmutation ResolverFinishBuild ($input: FinishBuildInput!) {\n\tfinishBuild(input: $input) {\n\t\tid\n\t\tstatus\n\t\twallclockTimeMs\n\t}\n}\n",
  "variables": {
    "input": {
      "appName": "********",
      "buildId": "3925677",
      "builderMeta": {
        "builderType": "",
        "buildkitEnabled": false,
        "dockerVersion": "",
        "platform": "",
        "remoteAppName": "",
        "remoteMachineId": ""
      },
      "clientMutationId": "",
      "finalImage": {
        "id": "",
        "sizeBytes": 0,
        "tag": ""
      },
      "logs": "error connecting to docker: server returned a non-200 status code: 500",
      "machineId": "",
      "status": "failed",
      "strategiesAttempted": [
        {
          "error": "",
          "note": "no buildpack builder configured, skipping",
          "result": "failed",
          "strategy": "Buildpacks"
        },
        {
          "error": "error connecting to docker: server returned a non-200 status code: 500",
          "note": "",
          "result": "failed",
          "strategy": "Dockerfile"
        }
      ],
      "timings": {
        "buildAndPushMs": 634,
        "buildMs": 634,
        "builderInitMs": 634,
        "contextBuildMs": -1,
        "imageBuildMs": -1,
        "pushMs": -1
      }
    }
  },
  "operationName": "ResolverFinishBuild"
}

DEBUG {0xc00068bc50}
DEBUG failed to connect metrics websocket: websocket.Dial wss://flyctl-metrics.fly.dev/socket: dial tcp [2a09:8280:1::1c:3475]:443: connect: network is unreachable

DEBUG <-- 200 https://api.fly.io/graphql (610.84ms)

DEBUG {
  "data": {
    "finishBuild": {
      "id": "3925677",
      "status": "failed",
      "wallclockTimeMs": 1239
    }
  }
}

DEBUG Task manager done
DEBUG Config has metrics token

DEBUG failed to connect metrics websocket: websocket.Dial wss://flyctl-metrics.fly.dev/socket: dial tcp [2a09:8280:1::1c:3475]:443: connect: network is unreachable

DEBUG Config has metrics token

DEBUG failed to connect metrics websocket: websocket.Dial wss://flyctl-metrics.fly.dev/socket: dial tcp [2a09:8280:1::1c:3475]:443: connect: network is unreachable

Error: failed to fetch an image or build from source: error connecting to docker: server returned a non-200 status code: 500

fly apps list shows the builder with the status pending - not sure if that is normal.

Now itā€™s really going south.
The app is suspended, the builder is pending.
In a couple of hours I have a demo.
Seems like fly.io is no reliable alternative.
Does somebody from fly.io even care about it?!

Update:
I destroyed everything, including the builder.
It started deploying until:
Error: error creating a new machine: failed to launch VM: unable to use requested volume, ā€˜ā€¦ā€™ due to capacity constraints
I will do the demo locally and then look for a better provider.
I think in this couple of days Iā€™m wasting my time with fly.io, I got every possible error.

I am getting the same error this morning.

Platform: machines
āœ“ Configuration is valid
ā†’ Verified app config
==> Building image
WARN Failed to start remote builder heartbeat: server returned a non-200 status code: 500

Error: failed to fetch an image or build from source: error connecting to docker: server returned a non-200 status code: 500

1 Like

People cannot deploy with fly for ONE WEEK.
Your status page shows ā€œAll Systems Operationalā€.
Your SLA commits to 99.9%.

Can you please update your status page and SLA to be closer to whatā€™s happening here?

We apologize for the continued issues with using remote builders for deploys. We changed how they are provisioned and have been working through various edge conditions based on the new logic. Thereā€™s still a couple edge cases left weā€™re working to address now.

If you are still affected by the remote builder not creating/starting, you can build the image locally or as part of your CI/CD pipeline and push to our registry and then specify the image as part of fly deploy. We know this is not ideal and want to provide the simplest possible UX for deploys and our builders are a big part of that. They are also provided to customers without any costs and are not the only way to build/push/deploy on our system.

Thanks for the update JP.

Our deployment ability seems to have just come back for us, and we appreciate the work thatā€™s going on to improve this.

For those that may still be having issues, do you have documentation they can follow to deploy using one of the alternative methods? I would be stuck if I had to do this without fly deploy

Thanks again