We run an app made with remix and we host the application on fly.io.
We have 4 instances running right now. Since last Friday(29 august) we are having issues with our apps and deployment. If we try to deploy any new build, even if its just 1 text string changed, the deployment fails and the machines start crashing. This happens across all 4 different apps we have that run identical application code.
If I deploy, using an image that was create before the 29th, everything works fine and the app works smooth as always.
All the logs show is from time to time an error about connecting to our database(supabase cloud hosted) but that is not consistent at all. From the information i have there aren’t any issues going on Supabase right now. Even if we try to adjust the settings for connecting to our DB, we can never deploy them. The image builds but then the app wont start and won’t throw any errors in the logs.
Has there been some changes on fly.io that could be causing that?
Any ideas on how to debug and figure out what is going on are welcome as we are kind of lost right now.
Could you post some logs of the build/deploy process?
You might try setting the run command in the Dockerfile to ["sleep", "inf"] (or equivalent for the OS you’re using) and then fly ssh console into one of the machines and try running the app yourself.
ッ ~/devs/shelf/webapp main ❯ fly deploy --app shelf-webapp-staging
==> Verifying app config
Validating /Users/donkoko/devs/shelf/webapp/fly.toml
✓ Configuration is valid
--> Verified app config
==> Building image
==> Building image with Depot
--> build: ()
[+] Building 12.5s (22/22) FINISHED
=> [internal] load build definition from Dockerfile 0.1s
=> => transferring dockerfile: 1.14kB 0.1s
=> [internal] load metadata for docker.io/library/node:22-bookworm-slim 0.9s
=> [internal] load .dockerignore 0.2s
=> => transferring context: 235B 0.2s
=> [base 1/3] FROM docker.io/library/node:22-bookworm-slim@sha256:0ae9e80c8c7e7a8fea5bc8e8762e4fd09a7a68c251abf8cf44 0.0s
=> => resolve docker.io/library/node:22-bookworm-slim@sha256:0ae9e80c8c7e7a8fea5bc8e8762e4fd09a7a68c251abf8cf44ea086 0.0s
=> [internal] load build context 0.3s
=> => transferring context: 140.45kB 0.2s
=> CACHED [base 2/3] WORKDIR /src 0.0s
=> CACHED [base 3/3] RUN apt-get update && apt-get install -y openssl && rm -rf /var/lib/apt/lists/* 0.0s
=> CACHED [deps 1/2] ADD package.json . 0.0s
=> CACHED [deps 2/2] RUN npm install --include=dev 0.0s
=> CACHED [build 1/5] COPY --from=deps /src/node_modules /src/node_modules 0.0s
=> CACHED [build 2/5] ADD . . 0.0s
=> CACHED [build 3/5] RUN npx prisma generate 0.0s
=> CACHED [build 4/5] RUN npm run build 0.0s
=> CACHED [build 5/5] RUN npm prune --omit=dev 0.0s
=> CACHED [release 1/7] COPY --from=build /src/node_modules /src/node_modules 0.0s
=> CACHED [release 2/7] COPY --from=build /src/app/database /src/app/database 0.0s
=> CACHED [release 3/7] COPY --from=build /src/build /src/build 0.0s
=> CACHED [release 4/7] COPY --from=build /src/package.json /src/package.json 0.0s
=> [release 5/7] COPY --from=build /src/prisma.config.ts /src/prisma.config.ts 0.0s
=> [release 6/7] COPY --from=build /src/start.sh /src/start.sh 0.0s
=> [release 7/7] RUN chmod +x /src/start.sh 0.1s
=> exporting to image 10.5s
=> => exporting layers 0.0s
=> => exporting manifest sha256:a0dc5bac6b85101c2dfeac3b1c7eba37a99e948412e40e1db51c888437d7d6f0 0.0s
=> => exporting config sha256:853ec7ff7d8dc61591b7c25ea9412b102e6287e06d86cc2dc53fc901f8205a01 0.0s
=> => pushing layers for registry.fly.io/shelf-webapp-staging:deployment-01K42R5CQP6GE6EBA2QA0VZ0KM@sha256:a0dc5bac6 8.1s
=> => pushing layer sha256:3db839210ab2919bf2ee6fbac3941676dc0301716d681daeaffcaa3729d60e27 6.2s
=> => pushing layer sha256:497a9806507f2464e45993b8414d56fc2585982420b58c4969fd945247294a7b 1.1s
=> => pushing layer sha256:853ec7ff7d8dc61591b7c25ea9412b102e6287e06d86cc2dc53fc901f8205a01 6.7s
=> => pushing layer sha256:1aee028a00e792b91e1f77be7098163607e01698b7d1b5b6907c1bad5d49aa70 0.9s
=> => pushing layer sha256:921fc9e8be93b9da9dcd891cc4c43d8df58a0ffe4982ba7262911526d012d717 6.3s
=> => pushing layer sha256:b1badc6e50664185acfaa0ca255d8076061c2a9d881cecaaad281ae11af000ce 1.0s
=> => pushing layer sha256:cc1c6f5ee41e7124cd8c27c2b15663aeb2937673a5bc7259f34f72d7dbeb1233 1.6s
=> => pushing layer sha256:291fc5dbbd39fe685232627c96a45e68110c8aa46cb7b065f3a72e72cb22c31d 1.7s
=> => pushing layer sha256:2c738285b693c0b99b318ab84c7a0b8c78de6034943572e952866ce3402fcaec 8.1s
=> => pushing layer sha256:2561d51a4c67aeb839775fbbf9f06e9b8f865499c3974f3b15d8748dcb0760a1 1.1s
=> => pushing layer sha256:8f2220110aac88b9b75675e1445f73639d6a6232cc57878f5668558519482d98 0.9s
=> => pushing layer sha256:159f498ecd9ded873c0407b253481fecfd5e7d9d8d5803aeaac40e829554e375 0.8s
=> => pushing layer sha256:4e06c1a47e17a8b62c4f581dfdec846741e13dbf17aacd62f8afa6d668d3da31 0.9s
=> => pushing layer sha256:2403c8b76caca8d2d0a53ca1ee2475bbaece39b293a49cfba277b01186986bb8 1.9s
=> => pushing layer sha256:29f22df150f8717959a407ae55f376fbe30e3e7c7fdcf05d4edf792cc93eb4ff 1.5s
=> => pushing manifest for registry.fly.io/shelf-webapp-staging:deployment-01K42R5CQP6GE6EBA2QA0VZ0KM@sha256:a0dc5ba 2.4s
--> Build Summary: ()
--> Building image done
image: registry.fly.io/shelf-webapp-staging:deployment-01K42R5CQP6GE6EBA2QA0VZ0KM
image size: 319 MB
Watch your deployment at https://fly.io/apps/shelf-webapp-staging/monitoring
-------
Updating existing machines in 'shelf-webapp-staging' with rolling strategy
WARNING The app is not listening on the expected address and will not be reachable by fly-proxy.
You can fix this by configuring your app to listen on the following addresses:
- 0.0.0.0:8080
Found these processes inside the machine with open listening sockets:
PROCESS | ADDRESSES
-----------------*--------------------------------------
/.fly/hallpass | [fdaa:1:512e:a7b:4d8:33eb:988:2]:22
-------
✔ Cleared lease for 32873163c54d68
-------
Error: failed to update machine 32873163c54d68: Unrecoverable error: error getting machine 32873163c54d68 from api: failed to get VM 32873163c54d68: resource_exhausted: rate limit exceeded (Request ID: 01K42RG4WGZSWZ2HSCBC4PRD8Z-otp) (Trace ID: 0bddf381380d2beeee329596e0ea7065)
And here is what I see in when running fly logs while deploying:
2025-09-01T13:42:13Z runner[32873163c54d68] ams [info]Pulling container image registry.fly.io/shelf-webapp-staging@sha256:a0dc5bac6b85101c2dfeac3b1c7eba37a99e948412e40e1db51c888437d7d6f0
2025-09-01T13:42:15Z runner[32873163c54d68] ams [info]Successfully prepared image registry.fly.io/shelf-webapp-staging@sha256:a0dc5bac6b85101c2dfeac3b1c7eba37a99e948412e40e1db51c888437d7d6f0 (2.076672334s)
2025-09-01T13:42:16Z runner[32873163c54d68] ams [info]Configuring firecracker
2025-09-01T13:42:16Z app[32873163c54d68] ams [info] INFO Sending signal SIGINT to main child process w/ PID 634
2025-09-01T13:42:18Z app[32873163c54d68] ams [info] INFO Sending signal SIGTERM to main child process w/ PID 634
2025-09-01T13:42:18Z app[32873163c54d68] ams [info] INFO Main child exited with signal (with signal 'SIGTERM', core dumped? false)
2025-09-01T13:42:18Z app[32873163c54d68] ams [info] INFO Starting clean up.
2025-09-01T13:42:18Z app[32873163c54d68] ams [info] WARN could not unmount /rootfs: EINVAL: Invalid argument
2025-09-01T13:42:18Z app[32873163c54d68] ams [info][ 183.374660] reboot: Restarting system
2025-09-01T13:42:19Z app[32873163c54d68] ams [info]2025-09-01T13:42:19.933083315 [01K42R74QFB2MKMQE1VQ7VTZ70:main] Running Firecracker v1.12.1
2025-09-01T13:42:19Z app[32873163c54d68] ams [info]2025-09-01T13:42:19.933424796 [01K42R74QFB2MKMQE1VQ7VTZ70:main] Listening on API socket ("/fc.sock").
2025-09-01T13:42:20Z app[32873163c54d68] ams [info] INFO Starting init (commit: f6529c7)...
2025-09-01T13:42:20Z app[32873163c54d68] ams [info] INFO Preparing to run: `/src/start.sh` as root
2025-09-01T13:42:20Z app[32873163c54d68] ams [info] INFO [fly api proxy] listening at /.fly/api
2025-09-01T13:42:20Z app[32873163c54d68] ams [info]+ npm run start
2025-09-01T13:42:20Z runner[32873163c54d68] ams [info]Machine created and started in 7.918s
2025-09-01T13:42:21Z health[32873163c54d68] ams [error]Health check on port 8080 has failed. Your app is not responding properly. Services exposed on ports [80, 80, 443, 443] will have intermittent failures until the health check passes.
2025-09-01T13:42:21Z app[32873163c54d68] ams [info]2025/09/01 13:42:21 INFO SSH listening listen_address=[fdaa:1:512e:a7b:4d8:33eb:988:2]:22
2025-09-01T13:42:21Z app[32873163c54d68] ams [info]> start
2025-09-01T13:42:21Z app[32873163c54d68] ams [info]> NODE_ENV=production node ./build/server/index.js
2025-09-01T13:43:32Z proxy[32873163c54d68] otp [error][PR04] could not find a good candidate within 20 attempts at load balancing
I managed to fix the issue by downgrading from node 22 to node 20. Then the deploy issue was resolved. I also suspect that it was some patch version that broke it but I haven’t figured out which package it was yet.