Error: timeout reached waiting for healthchecks to pass for machine 91857290b01238 failed to get VM 91857290b01238: Get "https://api.machines.dev/v1/apps/api/machines/91857290b01238": net/http: request canceled
note: you can change this timeout with the --wait-timeout flag
I read the docs and tried applying various solutions found on this forum. Nothing worked.
Removing the http_checks the app deploys just fine, and the configured path returns 200 in the browser right now. SSHing into the app fails (maybe because of the distroless image).
I need http_checks because without it no check whatsoever is performed and it is possible to deploy an app that fails to start. In that case my ci/cd pipeline passes while in reality the app goes on a restart loop rampage.
Running a koa server in nodejs, binding to 0.0.0.0 port 8080.
FROM node:lts-slim
WORKDIR /
# Build
COPY . .
WORKDIR /app/web
RUN npm ci
RUN npm run build:ssr:production
FROM gcr.io/distroless/nodejs18-debian11 as web
COPY --from=0 /app/web/dist/. /app/web/dist/.
EXPOSE 4000
WORKDIR /app/web
CMD [ "/app/web/dist/app/server/main.js" ]
FROM gcr.io/distroless/nodejs18-debian11 as server
COPY --from=0 /app/server/node_modules/ /app/server/node_modules/
COPY --from=0 /app/server/out/ /app/server/out/
EXPOSE 8080
CMD [ "/app/server/out/main.js" ]
$ fly deploy -c fly.api.toml --remote-only --build-target server --dockerfile Dockerfile
--> Pushing image done
image: registry.fly.io/app-api:deployment-01H2F7BDYWQCVWQ56E7W9W18D7
image size: 220 MB
Watch your app at https://fly.io/apps/app-api/monitoring
Updating existing machines in 'app-api' with rolling strategy
[1/2] Waiting for 91857290b01238 [app] to become healthy: 0/1
Error: timeout reached waiting for healthchecks to pass for machine 91857290b01238 failed to get VM 91857290b01238: Get "https://api.machines.dev/v1/apps/app-api/machines/91857290b01238": net/http: request canceled
note: you can change this timeout with the --wait-timeout flag
➜ app main ✗
After commenting out the [[services.http_checks]] section the app deployed normally and accessing the path from the parameter in the browser returns status 200.
From logs it doesn’t seem your app is taking too long to boot but can you try increasing your grace_period? Maybe your app responses timeout on the first few seconds while waiting for databases.
Also, another way you can debug that is by listing running processes on your app to confirm everything is running smoothly.
If possible, can you share logs from the crash loop you mentioned before? What we will be looking for is how fly proxy responds to your health checks failing.