Requesting recommendations for making Fly less flaky for CI

venkiiii · September 13, 2022, 4:26pm

We currently use Fly for CI, via creating a new app per-PR, but we’re running into quite a lot of CI flakiness.

Specifically, we’ve had over 50% of CI runs need to be manually re-run over the last 24h due to one of the following errors:

WARN Remote builder did not start in time. Check remote builder logs
Error Post "https://api.fly.io/graphql": http2: server sent GOAWAY and closed the connection
flyctl deploy just gets stuck for a long long time (>20 minutes).

Is there any official way to wrap around this for more consistent CI, ideally some combination of retries and timeouts?
I’m also open to any suggestions from anyone else using flyctl for CI as to how you’d best wrap this.

ignoramous · September 13, 2022, 5:48pm

Some of these remote builder issues should have been fixed; but you may want to consider building images locally in the mean while:

flyctl deploy <args> --local-only

You could also choose to build image wherever and push them to Fly’s registry to then later deploy the pushed image: Deploying infrastructure - #2 by kurt | Deployment in CI issue - #3 by ignoramous

venkiiii · September 13, 2022, 7:07pm

Don’t think this approach is super helpful:

Locally building is kinda a pain, and we’d like to use remote building since Fly provides it
Remote builder issues are transient (ie: immediately retrying often works), and actually only a small portion of our total failures.

I think this approach would need some significant effort to figure out where else we’d be building our image, and yet still not solve majority of the flakiness.

I think I might default to something like for i in {1..3}; do timeout 600 flyctl deploy <args> && break || sleep 15; done - but I’m interested in any other proposals in that genre - which don’t try to work around a single sources of flakiness, but rather provide some limited level of blanket resilience against current and future transient errors due to Fly.

Topic		Replies	Views
GH Actions \| Fly builder timeout	1	261	June 8, 2022
Failed to start remote builder heartbeat: Timeout on EnsureMachineRemoteBuilder.machine Build debugging	1	287	July 8, 2023
Can't build remotely (flyctl deploy)	6	810	October 13, 2022
Remote deploys are not working	2	183	November 7, 2023
Deploy --remote-only failing and kept retrying Build debugging	0	340	August 24, 2023

Requesting recommendations for making Fly less flaky for CI

Related topics