Timeout reached waiting for healthchecks to pass for machine

Error: timeout reached waiting for healthchecks to pass for machine 91857290b01238 failed to get VM 91857290b01238: Get "https://api.machines.dev/v1/apps/api/machines/91857290b01238": net/http: request canceled
note: you can change this timeout with the --wait-timeout flag

I read the docs and tried applying various solutions found on this forum. Nothing worked.

Removing the http_checks the app deploys just fine, and the configured path returns 200 in the browser right now. SSHing into the app fails (maybe because of the distroless image).

I need http_checks because without it no check whatsoever is performed and it is possible to deploy an app that fails to start. In that case my ci/cd pipeline passes while in reality the app goes on a restart loop rampage.

Running a koa server in nodejs, binding to 0.0.0.0 port 8080.

fly.toml

app = api
primary_region = "ams"

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 1

  [[services.http_checks]]
    interval = 10000
    grace_period = "5s"
    method = "get"
    path = "api/utils/ping"
    protocol = "http"
    restart_limit = 0
    timeout = 2000
    tls_skip_verify = true
    [services.http_checks.headers]

Dockerfile

FROM node:lts-slim
WORKDIR /

# Build
COPY . .
WORKDIR /app/web
RUN npm ci
RUN npm run build:ssr:production

FROM gcr.io/distroless/nodejs18-debian11 as web
COPY --from=0 /app/web/dist/. /app/web/dist/.
EXPOSE 4000
WORKDIR /app/web
CMD [ "/app/web/dist/app/server/main.js" ]

FROM gcr.io/distroless/nodejs18-debian11 as server
COPY --from=0 /app/server/node_modules/ /app/server/node_modules/
COPY --from=0 /app/server/out/ /app/server/out/
EXPOSE 8080
CMD [ "/app/server/out/main.js" ]
$ fly deploy -c fly.api.toml --remote-only --build-target server --dockerfile Dockerfile

Can you experiment adding a leading “/” on your path?

- path = "api/utils/ping"
+ path = "/api/utils/ping"

Also can you share logs for your app when it’s booting up and the proxy logs mentioning the health checks are unreachable?

$ flyctl

--> Pushing image done
image: registry.fly.io/app-api:deployment-01H2F7BDYWQCVWQ56E7W9W18D7
image size: 220 MB
Watch your app at https://fly.io/apps/app-api/monitoring
Updating existing machines in 'app-api' with rolling strategy
[1/2] Waiting for 91857290b01238 [app] to become healthy: 0/1
Error: timeout reached waiting for healthchecks to pass for machine 91857290b01238 failed to get VM 91857290b01238: Get "https://api.machines.dev/v1/apps/app-api/machines/91857290b01238": net/http: request canceled
note: you can change this timeout with the --wait-timeout flag
➜ app main ✗

Monitoring VM 91857290b01238

2023-06-09T04:43:36.781 app[91857290b01238] ams [info] Starting clean up.
2023-06-09T04:43:36.782 app[91857290b01238] ams [info] hallpass exited, pid: 513, status: signal: 15
2023-06-09T04:43:36.787 app[91857290b01238] ams [info] 2023/06/09 04:43:36 listening on [fdaa:2:501a:a7b:10e:2738:a9a3:2]:22 (DNS: [fdaa::3]:53)
2023-06-09T04:43:37.783 app[91857290b01238] ams [info] [ 324.494683] reboot: Restarting system
2023-06-09T04:43:38.178 app[91857290b01238] ams [info] Starting init (commit: 8af0ddf)...
2023-06-09T04:43:38.191 app[91857290b01238] ams [info] Preparing to run: `/nodejs/bin/node /app/server/out/main.js` as 0
2023-06-09T04:43:38.197 app[91857290b01238] ams [info] 2023/06/09 04:43:38 listening on [fdaa:2:501a:a7b:10e:2738:a9a3:2]:22 (DNS: [fdaa::3]:53)
2023-06-09T04:43:39.013 app[91857290b01238] ams [info] [Redis] Connection established.
2023-06-09T04:43:39.152 app[91857290b01238] ams [info] [MongoDB] Connection established.
2023-06-09T04:43:39.160 app[91857290b01238] ams [info] [Server] Listening on http://0.0.0.0:8080.

fly.toml

I made your suggested change

app = "app-api"
primary_region = "ams"

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 1

  [[services.http_checks]]
    interval = 10000
    grace_period = "5s"
    method = "get"
    path = "/api/utils/ping"
    protocol = "http"
    restart_limit = 0
    timeout = 2000
    tls_skip_verify = true
    [services.http_checks.headers]

After commenting out the [[services.http_checks]] section the app deployed normally and accessing the path from the parameter in the browser returns status 200.

Thanks for the information.

From logs it doesn’t seem your app is taking too long to boot but can you try increasing your grace_period? Maybe your app responses timeout on the first few seconds while waiting for databases.

Also, another way you can debug that is by listing running processes on your app to confirm everything is running smoothly.

If possible, can you share logs from the crash loop you mentioned before? What we will be looking for is how fly proxy responds to your health checks failing.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.