Deployments failing sometimes (flaky)

Here it says deploying multiple apps with remote builders simultaneously would be possible.

When our CI tries to do that, we get this on all services:

blu-jobs:deploy: --> Pushing image done03:41
blu-jobs:deploy: image: registry.fly.io/blu-jobs:deployment-01GNAV...
blu-jobs:deploy: image size: 1.9 GB03:41
blu-jobs:deploy: ==> Creating release03:41
blu-jobs:deploy: --> release v69 created03:42
blu-jobs:deploy: 03:42
blu-jobs:deploy: --> 
You can detach the terminal anytime without stopping the deployment03:42
blu-jobs:deploy: ==> Monitoring deployment03:42
blu-jobs:deploy: Logs: https://fly.io/apps/blu-jobs/monitoring03:42
blu-jobs:deploy: 03:44
blu-jobs:deploy: v69 is being deployed
blu-jobs:deploy: --> v69 failed 
- Failed due to unhealthy allocations 
- rolling back to job version 68 and deploying as v70 08:45
blu-jobs:deploy: 08:45
blu-jobs:deploy: --> Troubleshooting guide at https://fly.io/docs/...
blu-jobs:deploy: Error abort08:45
blu-jobs:deploy: 08:45
blu-jobs:deploy: 08:45
blu-jobs:deploy:  ELIFECYCLE  Command failed with exit code 1.

This does not happen, when deploying sequentially, but this can take 30min+ for multiple apps and services.

Is there a way to solve this?

Hi,

The pricing page used to have a panel which said the number of concurrent builds was limited per-plan. If that limit is still in place, it would explain problems when doing concurrent builds. I recall that earlier this year the Hobby Plan had 1, the Scale Plan 10 (I think), and so on:

However that table has been removed/moved, so I’m not sure what the current position is and whether concurrent builds are possible.

1 Like

Hey,

thanks for the info.

It’d be great if someone from Fly could chime in and clarify this. It’d be willing to pay for that.

A transparent and more fine-grained pricing (e.g. per additional worker) would be helpful, though.

@jerome, sorry for pinging, but could you just drop a note on that? :sweat_smile:

Hey there!

Concurrent builds should work fine, they’ll just be a bit slower if they’re CPU-bound.

That error doesn’t suggest the build failed :thinking:. After the image has been built and pushed, remote builders aren’t involved in the process anymore.

Is it happening consistently?

1 Like

Hey,

thanks so much for the quick reply :-).

No, after more thorough inspection, it seems to be flaky, unfortunately. (changed title to reflect that)

I think, we sometimes get the same deploy errors when running sequentially, although less frequently.

Just redeploying the service (without changing anything) often works, but next time another service might fail.

It’s also not a specific service that’s affected, they just fail to deploy randomly. This includes web services with HTTP health checks, but also background workers without checks.

Seem to be caused by different errors:

blu-jobs:deploy: --> release v74 created11:24
blu-jobs:deploy: ==> Monitoring deployment11:24
blu-jobs:deploy: Error 1 error occurred:16:25
blu-jobs:deploy: 	* No deployment available to monitor16:25

(this one happens very often)

and

blu-cms:deploy: --> v93 failed - Failed due to unhealthy allocation

(also quite often)

and

blu-jobs:deploy: WARN Failed to start remote builder heartbeat: server returned a non-200 status code: 50401:22
blu-jobs:deploy: Error failed to fetch an image or build from source: error connecting to docker: server returned a non-200 status code: 504

(only once i think)

I understand, that these are probably different issues, but the whole deployment experience with Fly currently just resembles Russian roulette, with about 1-3 of 6 chambers filled :grin:.

OK, first issue (No deployment available to monitor) was because two services were scaled down to 0 instances.