Unable to deploy to any region - deployment forever loop

I have two apps and neither can deploy to any region.

When running deploy with debug log level it looks like this:

LOG_LEVEL=debug fly deploy
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌏DEBUG Remote builder unavailable, retrying in 83.919061ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌏DEBUG Remote builder unavailable, retrying in 101.424696ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌎DEBUG Remote builder unavailable, retrying in 61.559515ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌍DEBUG Remote builder unavailable, retrying in 105.243018ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌍DEBUG Remote builder unavailable, retrying in 192.573638ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌏DEBUG Remote builder unavailable, retrying in 133.704762ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌏DEBUG Remote builder unavailable, retrying in 185.359383ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌏DEBUG Remote builder unavailable, retrying in 200ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌎DEBUG Remote builder unavailable, retrying in 200ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌎DEBUG Remote builder unavailable, retrying in 65.569299ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌍DEBUG Remote builder unavailable, retrying in 200ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌍DEBUG Remote builder unavailable, retrying in 200ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌍DEBUG Remote builder unavailable, retrying in 200ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌍DEBUG Remote builder unavailable, retrying in 125.430248ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌏DEBUG Remote builder unavailable, retrying in 200ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌏DEBUG Remote builder unavailable, retrying in 200ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌏DEBUG Remote builder unavailable, retrying in 200ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌎DEBUG Remote builder unavailable, retrying in 200ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌎DEBUG Remote builder unavailable, retrying in 200ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌎DEBUG Remote builder unavailable, retrying in 200ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌎DEBUG Remote builder unavailable, retrying in 200ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌎DEBUG Remote builder unavailable, retrying in 200ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌍DEBUG Remote builder unavailable, retrying in 200ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌍DEBUG Remote builder unavailable, retrying in 200ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌍DEBUG Remote builder unavailable, retrying in 200ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌍DEBUG Remote builder unavailable, retrying in 200ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌏DEBUG Remote builder unavailable, retrying in 200ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)
Waiting for remote builder fly-builder-spring-waterfall-6790... 🌏DEBUG Remote builder unavailable, retrying in 200ms (err: Get "http://[fdaa:0:be06:a7b:a062:dd6:899d:2]:2375/_ping": context deadline exceeded)

Eventually it times out.

And the builder logs look like this:

2023-01-03T15:33:06.774 runner[1781956c934289] dfw [info] Pulling container image

2023-01-03T15:33:08.514 runner[1781956c934289] dfw [info] Pulling container image

2023-01-03T15:33:11.460 runner[1781956c934289] dfw [info] Pulling container image

2023-01-03T15:33:14.124 runner[1781956c934289] dfw [info] Pulling container image

2023-01-03T15:33:15.937 runner[1781956c934289] dfw [info] Pulling container image

2023-01-03T15:33:17.213 runner[1781956c934289] dfw [info] Pulling container image

2023-01-03T15:33:19.823 runner[1781956c934289] dfw [info] Pulling container image

2023-01-03T15:33:20.936 runner[1781956c934289] dfw [info] Pulling container image

2023-01-03T15:33:23.162 runner[1781956c934289] dfw [info] Pulling container image

2023-01-03T15:33:25.558 runner[1781956c934289] dfw [info] Pulling container image

2023-01-03T15:33:28.558 runner[1781956c934289] dfw [info] Pulling container image

2023-01-03T15:33:29.716 runner[1781956c934289] dfw [info] Pulling container image

2023-01-03T15:33:30.840 runner[1781956c934289] dfw [info] Pulling container image

I have tried killing the builder and it generates a new one, but that did not help. It still loops forever without building my image or deploying.

Any ideas?

Update: I was able to deploy using the --local-only option for fly deploy as follows:

fly deploy --local-only

This worked for my first app and it is now deployed successfully.

I have never had to do this before and I can’t think of anything I have changed that might have affected deploys, so I suspect there is something wrong with deploys on fly right now.

Yeah same issues here, what I had to do was set DOCKER_BUILDKIT=1 in my local shell then deploy from local. Not sure if you can set that env var on a builder, but that might resolve whatever issues you’re facing.

same issue now :frowning_face: seems to be happening for at least 10h+ now

The original issue here was a broken builder in DFW, which I believe we fixed. Are you seeing the same “waiting on builder” errors? Which region?

yup seeing on fra, gru and mia. I’m trying to deploy an app in these regions and it hangs and does nothinh (and nothing on logs too)

plus, might be related or not, but removing a backup region does nothing. tried with remove region and remove backup region too

edit: one more thing. i notice my app is rebooting randomly too. not sure what is going on, there is no high traffic, memory leak, etc

Did you set --max-per-region by chance? If your app has three regions and you run fly scale count 3 --max-per-region=1, then you remove a region – it might hang forever waiting for something it can’t do (try to place 3 vms in only two regions).

The errors in the initial post were very specific to builders, if you’re seeing no output at all it’s likely something else is wrong.

If your app reboots randomly, you can troubleshoot with fly status --all and fly vm status <id>. That will show you if it’s exiting with a weird code, or if health checks are failing. Both can cause a restart.

2 Likes

Did you set --max-per-region by chance?

yes, I did, I have 3 regions and I ran fly scale count 3 --max-per-region=1.

The errors in the initial post were very specific to builders

yeah, that’s true sorry. I thought that the errors could be related somehow, my bad but I didn’t pay attention enough hahaha

you can troubleshoot with fly status --all

as I was typing this reply, all 3 regions are now up and running, even though fra took some time to properly boot up and even failed during boot with this error

Events
TIMESTAMP           	TYPE    	MESSAGE
2023-01-04T03:08:45Z	Received	Task received by client
2023-01-04T03:08:45Z	Killing 	Sent interrupt. Waiting 5s before force killing

Checks
ID	SERVICE	STATE	OUTPUT

Recent Logs

as for the random reboots, not sure, I can’t see anything suspicious:

2023-01-04T01:37:02Z	Started   	Task started by client
2023-01-04T02:19:41Z	Killing   	Sent interrupt. Waiting 5s before force killing
2023-01-04T02:19:58Z	Terminated	Exit Code: 0
2023-01-04T02:19:58Z	Killed    	Task successfully killed

regarding removing a backup region, it is not working (or I’m using the wrong command lol). I have mia as backup and fly regions backup remove mia or fly regions remove mia does nothing. they print all regions I have, mia included

edit: as soon as I posted, all my vms died and reboots, same logs as above (“sent interrupt, yada yada”). I didn’t trigger a deploy or something so this is weird

Ah, backup regions and max-per-region probably conflict. We’ve deprecated backup regions (they never work like people want). You’ll need to have three primary regions to do deploys with fly scale count 3 --max-per-region 1.

“Sent interrupt” is normally from a release or scaling operation or region change. You should be able to see those in fly releases list.

Thanks for the information.

Today I am able to run fly deploy with no additional parameters and everything works. Looks like a bug has been fixed!

got it but I still can’t remove a backup region from my app

vscode ➜ /workspaces/sumiu (main ✗) $ fly regions backup remove mia
Region Pool:
cdg
fra
gru
Backup Region:
mia
vscode ➜ /workspaces/sumiu (main ✗) $ fly regions remove mia
Region Pool:
cdg
fra
gru
Backup Region:
mia

neither these commands remove the backup region

update: the command to remove a backup region is just fly regions backup remove :tada: