Deploying to Fly from Github Action hangs

My project deploys itself using Github Actions for CI/CD with flyctl-actions.

I’m finding that the deploy step of my workflow is hanging indefinitely and I have to kill the job manually. The app is still deployed successfully - the deploy step just seems to get stuck or never exit.

Here’s the relevent part of my workflow:

  - name: Deploy
    uses: superfly/flyctl-actions@1.1
    env:
      FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
    with:
      args: "deploy -c fly/fly.${{ github.event.deployment.environment }}.toml -i $DOCKER_REGISTRY:${{ steps.get_image_tag.outputs.image_tag }}"

There’s no build step on Fly; I’m using a prebuilt image that I previously pushed to my registry.

Have anything changed in the Fly CLI recently that might affect exit codes?

The actual logs and the output of fly checks list would help a lot debugging this — would you be able to post them here? There are a couple reasons this could happen but hard to know without the logs.

Based on the fact that the app is still deployed correctly I’m guessing one of the checks is taking too long to go green, we can confirm that.

FWIW this doesn’t happen when I deploy the app locally

Here’s the output of fly checks list:

$ fly -c fly/fly.staging.toml checks list                                                      [16:57:24]
Health Checks for <my app>
NAME                             STATUS  ALLOCATION REGION TYPE LAST UPDATED OUTPUT
382404c10e0d588515ce107c8ca067ea passing 5349cae3   lhr    HTTP 53m22s ago   HTTP GET
                                                                             http://172.19.1.170:4000/healthz:
                                                                             200 OK Output: howdy 👋

This looks fine, can we look at the logs? There might be a clue there.

In both local and Github based deployments, you’re first pushing an image into the Fly registry and then just initiating a deploy with that image tag, right? And on GitHub actions that command does not terminate within a reasonable amount of time, but it does locally?

That’s correct - in GH actions it seems to hang forever. I terminated one run after about 35 minutes. It usually deploys in about 1 minute.

I’ll try and dig out some logs.

OK I just reran a deploy on GH actions and it seems like it’s working again. Very strange as it was previously hanging every time.

I’ll keep an eye out and report back with logs if it happens again.

1 Like

Just chiming in to say that I’m seeing the same issue today. AFAICT, the deploy is happening and the activity tab on my app shows a new release. It’s just that the command in the github action isn’t exiting, so it hangs indefinitely.

Could you paste the logs here? That’s mostly what will help debug.

Here’s the relevant part of a recent one (skipping all the docker build/push stuff earlier that it got through just fine:

2021-11-10T08:11:27.1880604Z deployment-1636531824: digest: sha256:97e651e11b2b1cd9fc3e42811b9ea4c8dd88d482f03038abd78e23240c14440b size: 2410
2021-11-10T08:11:27.1898431Z e[38;5;252m--> Pushing image donee[0m
2021-11-10T08:11:27.2127729Z Image: registry.fly.io/myopica:deployment-1636531824
2021-11-10T08:11:27.2131911Z Image size: 800 MB
2021-11-10T08:11:27.2176507Z e[32m==> Creating releasee[0m
2021-11-10T08:11:28.6223413Z Release v40 created
2021-11-10T08:11:28.6225695Z Release command detected: this new release will not be available until the command succeeds.
2021-11-10T08:11:28.6226321Z 
2021-11-10T08:11:28.6228366Z e[2mYou can detach the terminal anytime without stopping the deploymente[0m
2021-11-10T08:11:28.6229286Z e[32m==> Release commande[0m
2021-11-10T08:11:28.6230537Z Command: /run.sh migrate && /run.sh collectstatic && /run.sh compress
2021-11-10T08:11:41.2221091Z 	 Starting instance
2021-11-10T08:11:42.0105577Z 	 Starting init (commit: 7943db6)...
2021-11-10T12:55:18.8024226Z ##[error]The operation was canceled.

At the end, you can see that I finally cancelled the workflow about four and a half hours later.

Thanks, helps to know which point of the process is hanging. We’ll take a look.

This is currently happening for us as well.

==> Creating release
1831
Release v250 created
1832
Release command detected: this new release will not be available until the command succeeds.
1833

You can detach the terminal anytime without stopping the deployment
==> Release command
Command: /app/bin/enaia eval Enaia.Release.migrate
	 Configuring virtual machine
	 Starting virtual machine
	 Starting init (commit: 7943db6)...
	 2021/11/10 17:42:11 listening on [fdaa:0:309a:a7b:ab8:5083:6bf8:2]:22 (DNS: [fdaa::3]:53)
	 Reaped child process with pid: 562 and signal: SIGUSR1, core dumped? false
	 17:42:14.162 [info] Migrations already up
	 Main child exited normally with code: 0
Error: The operation was canceled.

The latest flyctl release includes a fix for release commands hanging. The underlying issue was a delay in logs holding up the release monitoring code.

Exit codes did change recently, but a non-zero is still returned on error. Were you seeing something in specific?

2 Likes

No, I was just seeing the command hanging when running in GH actions.

As I mentioned earlier it seems to have resolved itself for now.

Hopefully the latest release has made things more stable

1 Like