Bug Report - new customer, "remote builder app unavailable" - python/flask

I think I’ve gotten over this by using --local-only, but wanted to report in case it’s happening to others.

Onboarded as a new customer with a new app in the iad region. Started my first app following the python tutorial here.

All went well until I went to deploy, and I received these sets of errors:

(venv) ➜  src git:(main) ✗ flyctl deploy
==> Verifying app config
--> Verified app config
==> Building image
WARN Remote builder did not start in time. Check remote builder logs with `flyctl logs -a fly-builder-wispy-meadow-4726`
WARN Failed to start remote builder heartbeat: remote builder app unavailable

If I went ahead and tried with debug mode enabled, and I would run into this:

Waiting for remote builder fly-builder-wispy-meadow-4726... 🌎DEBUG Remote builder unavailable, retrying in 50ms (err: Get "http://[fdaa:1:1546:a7b:92:ed43:2a96:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-wispy-meadow-4726... 🌍DEBUG Remote builder unavailable, retrying in 56.483524ms (err: Get "http://[fdaa:1:1546:a7b:92:ed43:2a96:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-wispy-meadow-4726... 🌏DEBUG Remote builder unavailable, retrying in 53.032134ms (err: Get "http://[fdaa:1:1546:a7b:92:ed43:2a96:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-wispy-meadow-4726... 🌏DEBUG Remote builder unavailable, retrying in 51.110858ms (err: Get "http://[fdaa:1:1546:a7b:92:ed43:2a96:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-wispy-meadow-4726... 🌎DEBUG Remote builder unavailable, retrying in 69.720907ms (err: Get "http://[fdaa:1:1546:a7b:92:ed43:2a96:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-wispy-meadow-4726... 🌍DEBUG Remote builder unavailable, retrying in 80.363818ms (err: Get "http://[fdaa:1:1546:a7b:92:ed43:2a96:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-wispy-meadow-4726... 🌏DEBUG Remote builder unavailable, retrying in 133.122917ms (err: Get "http://[fdaa:1:1546:a7b:92:ed43:2a96:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-wispy-meadow-4726... 🌏DEBUG Remote builder unavailable, retrying in 91.077375ms (err: Get "http://[fdaa:1:1546:a7b:92:ed43:2a96:2]:2375/_ping": context deadline exceeded)

I tried both tethering and local wifi to see if it was some sort of IPV6 issue but I didn’t notice any change in behavior from either.

Trying to curl that address also doesn’t work, although I’m not sure if that’s something to do with curling IPV6 addresses.

After struggling to move to a Dockerfile only build for a little while I realized I could just pass --local-only directly to flyctl deploy with the template app, and that worked just fine for me since I already have docker running.

If I look at the logs in the UI for the build app, it looks like it’s stuck here:

$ fly logs -a fly-builder-wispy-meadow-4726

Waiting for logs...

2022-12-29T17:29:27.354 runner[91851d9c219583] iad [info] Reserved resources for machine '91851d9c219583'
2022-12-29T17:29:27.359 runner[91851d9c219583] iad [info] Pulling container image
2022-12-29T17:29:27.523 runner[91851d9c219583] iad [info] Unpacking image
2022-12-29T17:29:32.545 runner[91851d9c219583] iad [info] Setting up volume 'machine_data'
2022-12-29T17:29:32.548 runner[91851d9c219583] iad [info] Uninitialized volume 'machine_data', initializing...
2022-12-29T17:29:32.548 runner[91851d9c219583] iad [info] Formatting volume

(It’s now 30 minutes later than that).

Not sure what’s going on, but figured I’d flag it both to recommend --local-only to others if they see it and also to make sure I’m not doing anything silly.

Update on this: It looks like after about 40 minutes the builder finally came to life and it now seems to be working correctly, and I can now deploy without --local-only:

2022-12-29T17:29:27.354 runner[91851d9c219583] iad [info] Reserved resources for machine '91851d9c219583'
2022-12-29T17:29:27.359 runner[91851d9c219583] iad [info] Pulling container image
2022-12-29T17:29:27.523 runner[91851d9c219583] iad [info] Unpacking image
2022-12-29T17:29:32.545 runner[91851d9c219583] iad [info] Setting up volume 'machine_data'
2022-12-29T17:29:32.548 runner[91851d9c219583] iad [info] Uninitialized volume 'machine_data', initializing...
2022-12-29T17:29:32.548 runner[91851d9c219583] iad [info] Formatting volume
2022-12-29T18:09:57.990 runner[91851d9c219583] iad [info] Configuring firecracker
2022-12-29T18:09:59.208 app[91851d9c219583] iad [info] Running docker-entrypoint.d files
2022-12-29T18:09:59.316 app[91851d9c219583] iad [info] Setting up Docker data directory
2022-12-29T18:09:59.322 app[91851d9c219583] iad [info] Done setting up docker!
...

Is that amount of delay expected? Maybe it’s part of new account set up that this would take some time?

Looks like there are problems in IAD per @Eli via email:

We’re currently working on restoring normal operation to that host [in IAD] — in the meantime, re-deploying to a different region will ensure that your instances avoid the issue.

This issue doesn’t seem to be reflected on https://status.flyio.net

2 Likes

ah, nice find, @enaia – it turns out this was the same host. We rolled out a fix for it around an hour ago, and it looks like it’s now appropriately handling its apps’ deployments. The python app is now up and running-- let us know if there’s anything else you need.

1 Like

Thanks both of you!