Starting today I’m getting intermittent issues with deployments with the below error message. It seems to resolve itself if I wait a handful of minutes then will reappear an hour or two later if I attempt another deployment.
Updating existing machines in 'my-app1' with rolling strategy
> Acquiring lease for xxxxxxxxxxxx
> Acquired lease for xxxxxxxxxxxx
> Updating machine config for xxxxxxxxxxxx
> Updating xxxxxxxxxxxx [app]
> Updated machine config for xxxxxxxxxxxx
✔ Machine xxxxxxxxxxxx is now in a good state
> Clearing lease for xxxxxxxxxxxx
✔ Cleared lease for xxxxxxxxxxxx
==> Verifying app config
--> Verified app config
background-color: #F7F7F7;
border: 1px solid #CCC;
border-right-color: #999;
border-left-color: #999;
border-bottom-color: #999;
border-bottom-left-radius: 4px;
border-bottom-right-radius: 4px;
border-top-color: #DADADA;
color: #666;
box-shadow: 0 3px 8px rgba(50, 50, 50, 0.17);
}
</style>
</head>
<body>
<!-- This file lives in public/500.html -->
<div class="dialog">
<div>
<h1>We're sorry, but something went wrong.</h1>
</div>
<p>If you are the application owner check the logs for more information.</p>
</div>
</body>
</html>
Righto. Fly does use RoR, so I wonder if something is going wrong on their side; this looks like a “should never happen”. Perhaps they can look at their logs.
I was suspecting this being something on the fly end. Any thoughts on how I can flag this to fly? I don’t currently pay for support, so don’t have a support email.
All of us do look at the community forum, we just don’t guarantee support from here.
Anyway, back to your problem, it’s very odd. Are you able to share the app name? Or something else identifying that might help us find a trace or sentry exception?
edit: I’ve found a trace for one of your requests, having a closer look now!
It looks like this is an error coming from the registry. I’ve opened an internal discussion to see if we can figure out what’s causing the registry issue, and in the meantime I’m going to put together a small change for flyctl so that the output on error isn’t just HTML-direct-to-console
How do you deploy, and in what region? Can we see your fly.toml file? The forum would be on fire if deployments were intermittently working for everyone.
Our running assumption here is that this is related to some work we did fairly recently to create regional registry mirrors. It seems as though the mirror has not received all of the blobs of your image when you first deploy, and so spits out a 500 when it receives a request for the first blob it hasn’t got yet. We think it’s happening to you and seemingly no one else because your image might be abnormally large.
I think that’s why you’re seeing it “self-resolve” after a bit - that’s enough time for the whole image to have been loaded into the mirror.
If that’s right, anything you’re able to do to reduce image size might help.
For my deployment, we run a mix of opensource images like postgres, etcd, mailslurper images and our own custom apps. For the images that are custom and pushed to the fly registry, none of them are particularly large:
My deployment runs in a single GH action and just runs through the list of all apps to be deployed.
if [[ -n "${POSTGRES_PASSWORD:-}" ]]; then
flyctl secrets set POSTGRES_PASSWORD="$POSTGRES_PASSWORD" --app "$POSTGRES_APP_NAME" --stage
flyctl secrets set POSTGRES_PASSWORD="$POSTGRES_PASSWORD" --app "$OPENBAO_INIT_APP_NAME" --stage
fi
flyctl secrets set FLY_API_TOKEN="$FLY_API_TOKEN" --app "$OPENBAO_INIT_APP_NAME" --stage
# Deploy postgres
flyctl deploy --config /work/rendered/postgres-unified.fly.toml --app "$POSTGRES_APP_NAME" --ha=false --detach
wait_for_healthy "$POSTGRES_APP_NAME" 40 5
flyctl deploy --config /work/rendered/etcd.fly.toml --app "$ETCD_APP_NAME" --ha=false --detach
wait_for_healthy "$ETCD_APP_NAME" 40 5
# Check if OpenBao machine exists and is healthy
OPENBAO_HEALTHY=false
if flyctl machines list --app "$OPENBAO_APP_NAME" --json 2>/dev/null \
| jq -e 'map(select(.state == "started") | select(((.checks // []) | length == 0) or (((.checks // []) | map(.status == "passing") | all)))) | length > 0' >/dev/null 2>&1; then
echo "✓ OpenBao machine already running and healthy - skipping deployment"
OPENBAO_HEALTHY=true
else
echo "Deploying OpenBao (machine doesn't exist or unhealthy)..."
flyctl deploy --config /work/rendered/openbao.fly.toml --app "$OPENBAO_APP_NAME" --image "$OPENBAO_IMAGE" --ha=false --detach
wait_for_healthy "$OPENBAO_APP_NAME" 40 5
fi
When it fails, it varies - the above is what my deployment looks like and are the first apps that are deployed. Most frequently it fails deploying postgres, but occassionally will fail at etcd or openbao.
etcd and postgres are opensource images and openbao is one we build our own image (429.55 MB). Once an an app fails and receives the above error, the GH action fails. The openbao image is rarely rebuilt, as it doesn’t change often. Our main app, which we do update the image regularly (959.49 MB), deploys much further down the list.
I’m having the same exact issue, mine just wont work. It returns 500 every time flyctl deploy is ran. I’ve had this app for over a year just fine and performs deployment at least twice a week.
I’ve got flyctl v0.3.231.
```
Error: failed to create release (status 500)
<html>
<head>
<title>We're sorry, but something went wrong (500)</title>
<meta name="viewport" content="width=device-width,initial-scale=1">
<style>