Random and intermittent deployment error 500 (Fly Rails infra bug?)

b4b3l01 · December 3, 2025, 10:00pm

Starting today I’m getting intermittent issues with deployments with the below error message. It seems to resolve itself if I wait a handful of minutes then will reappear an hour or two later if I attempt another deployment.

 Updating existing machines in 'my-app1' with rolling strategy
 > Acquiring lease for xxxxxxxxxxxx
 > Acquired lease for xxxxxxxxxxxx
 > Updating machine config for xxxxxxxxxxxx
 > Updating xxxxxxxxxxxx [app]
 > Updated machine config for xxxxxxxxxxxx
 ✔ Machine xxxxxxxxxxxx is now in a good state
 > Clearing lease for xxxxxxxxxxxx
 ✔ Cleared lease for xxxxxxxxxxxx
 ==> Verifying app config
 --> Verified app config
     background-color: #F7F7F7;
     border: 1px solid #CCC;
     border-right-color: #999;
     border-left-color: #999;
     border-bottom-color: #999;
     border-bottom-left-radius: 4px;
     border-bottom-right-radius: 4px;
     border-top-color: #DADADA;
     color: #666;
     box-shadow: 0 3px 8px rgba(50, 50, 50, 0.17);
   }
   </style>
 </head>
 <body>
   <!-- This file lives in public/500.html -->
   <div class="dialog">
     <div>
       <h1>We're sorry, but something went wrong.</h1>
     </div>
     <p>If you are the application owner check the logs for more information.</p>
   </div>
 </body>
 </html>

halfer · December 4, 2025, 12:22am

AI tells me that public/500.html is an indicator of a Ruby on Rails application. Are you running RoR?

b4b3l01 · December 4, 2025, 9:20am

Hi Halfer - no RoR in our app. It occurs intermittently when running fly deploy with no rhyme or reason for when it works versus doesn’t.

halfer · December 4, 2025, 5:11pm

Righto. Fly does use RoR, so I wonder if something is going wrong on their side; this looks like a “should never happen”. Perhaps they can look at their logs.

b4b3l01 · December 5, 2025, 2:05pm

I was suspecting this being something on the fly end. Any thoughts on how I can flag this to fly? I don’t currently pay for support, so don’t have a support email.

jfent · December 5, 2025, 6:41pm

All of us do look at the community forum, we just don’t guarantee support from here.

Anyway, back to your problem, it’s very odd. Are you able to share the app name? Or something else identifying that might help us find a trace or sentry exception?

edit: I’ve found a trace for one of your requests, having a closer look now!

jfent · December 5, 2025, 7:58pm

It looks like this is an error coming from the registry. I’ve opened an internal discussion to see if we can figure out what’s causing the registry issue, and in the meantime I’m going to put together a small change for flyctl so that the output on error isn’t just HTML-direct-to-console

b4b3l01 · December 8, 2025, 8:01pm

Hi jfent - any further info on this or anything I can do from my end?

halfer · December 8, 2025, 8:38pm

How do you deploy, and in what region? Can we see your fly.toml file? The forum would be on fire if deployments were intermittently working for everyone.

jfent · December 9, 2025, 4:29pm

How big is your image?

Our running assumption here is that this is related to some work we did fairly recently to create regional registry mirrors. It seems as though the mirror has not received all of the blobs of your image when you first deploy, and so spits out a 500 when it receives a request for the first blob it hasn’t got yet. We think it’s happening to you and seemingly no one else because your image might be abnormally large.

I think that’s why you’re seeing it “self-resolve” after a bit - that’s enough time for the whole image to have been loaded into the mirror.

If that’s right, anything you’re able to do to reduce image size might help.

b4b3l01 · December 9, 2025, 8:41pm

Hi halfer and jfent -

For my deployment, we run a mix of opensource images like postgres, etcd, mailslurper images and our own custom apps. For the images that are custom and pushed to the fly registry, none of them are particularly large:

image 1: 959.49 MB
image 2: 64.17 MB
image 3: 572.25 MB
image 4: 429.55 MB

The region we deploy to is lhr.

jfent · December 9, 2025, 9:19pm

Are you seeing this problem with all of the images mentioned or just a subset?

b4b3l01 · December 9, 2025, 10:02pm

My deployment runs in a single GH action and just runs through the list of all apps to be deployed.

if [[ -n "${POSTGRES_PASSWORD:-}" ]]; then
  flyctl secrets set POSTGRES_PASSWORD="$POSTGRES_PASSWORD" --app "$POSTGRES_APP_NAME" --stage
  flyctl secrets set POSTGRES_PASSWORD="$POSTGRES_PASSWORD" --app "$OPENBAO_INIT_APP_NAME" --stage
fi

flyctl secrets set FLY_API_TOKEN="$FLY_API_TOKEN" --app "$OPENBAO_INIT_APP_NAME" --stage

# Deploy postgres 
flyctl deploy --config /work/rendered/postgres-unified.fly.toml --app "$POSTGRES_APP_NAME" --ha=false --detach
wait_for_healthy "$POSTGRES_APP_NAME" 40 5

flyctl deploy --config /work/rendered/etcd.fly.toml --app "$ETCD_APP_NAME" --ha=false --detach
wait_for_healthy "$ETCD_APP_NAME" 40 5

# Check if OpenBao machine exists and is healthy 
OPENBAO_HEALTHY=false
if flyctl machines list --app "$OPENBAO_APP_NAME" --json 2>/dev/null \
  | jq -e 'map(select(.state == "started") | select(((.checks // []) | length == 0) or (((.checks // []) | map(.status == "passing") | all)))) | length > 0' >/dev/null 2>&1; then
  echo "✓ OpenBao machine already running and healthy - skipping deployment"
  OPENBAO_HEALTHY=true
else
  echo "Deploying OpenBao (machine doesn't exist or unhealthy)..."
  flyctl deploy --config /work/rendered/openbao.fly.toml --app "$OPENBAO_APP_NAME" --image "$OPENBAO_IMAGE" --ha=false --detach
  wait_for_healthy "$OPENBAO_APP_NAME" 40 5
fi

When it fails, it varies - the above is what my deployment looks like and are the first apps that are deployed. Most frequently it fails deploying postgres, but occassionally will fail at etcd or openbao.

etcd and postgres are opensource images and openbao is one we build our own image (429.55 MB). Once an an app fails and receives the above error, the GH action fails. The openbao image is rarely rebuilt, as it doesn’t change often. Our main app, which we do update the image regularly (959.49 MB), deploys much further down the list.

jphenow · December 10, 2025, 4:14pm

Hi - I think I might have shipped a fix for this to flyctl in the last ~day or so, but I hadn’t realized it might help with exactly this conversation.

Does your action here always pull the latest flyctl? Could you try it one more time with the latest flyctl?

Could you also let us know the flyctl version on your latest runs?

I might have a few ideas from my latest fix if pulling latest doesn’t help you just yet.

bwoodlt · December 13, 2025, 1:02pm

I’m having the same exact issue, mine just wont work. It returns 500 every time flyctl deploy is ran. I’ve had this app for over a year just fine and performs deployment at least twice a week.

I’ve got flyctl v0.3.231.

```

Error: failed to create release (status 500)

<html>
<head>
  <title>We're sorry, but something went wrong (500)</title>
  <meta name="viewport" content="width=device-width,initial-scale=1">
  <style>

jphenow · December 17, 2025, 7:14pm

Heya, that’s odd.

Are you able to share an image reference you’re using or is your fly.toml referencing your dockerfile? Could you share your app name?

Also, I just deployed a couple fixes to some errors that were more obvious along with a couple flyctl tweaks (v0.3.233).

Could you try again let us know if it’s improved or changed at all for you? I’m not positive it will fix your exact case so definitely let us know here what you find and any more detail you can provide so we can investigate more directly.

bwoodlt · December 23, 2025, 4:11pm

This is now fixed! Heard back from the support team which provided further clarification.

Thanks!