# Health checks frozen / `net/http: request canceled` on machine status polling — app unreachable for 1h45m+ (nrt)
**App**: `keimimansai-shift`
**Machine**: `e78451d2a02d78` (region: `nrt`, shared-cpu-1x, 512mb)
**flyctl**: v0.4.59 windows/amd64 (Commit d10482182142f259db338dcef34556a67702290c, BuildDate 2026-06-09)
**Org**: personal
## Summary
Since ~15:49 JST today, this app has been unreachable via its public URL
(`https://keimimansai-shift.fly.dev/\`), and `flyctl deploy` consistently fails
at the final health-check-confirmation step. The condition has not changed at
all across 6 separate `flyctl deploy` runs and 2 different `http_service`
bind configurations over ~1h45m.
## Symptom 1: `flyctl deploy` fails at health-check wait
Every deploy succeeds at build / push / machine-config-update and reaches
“started” state (currently on release version 6), but then fails:
```
> Waiting for machine e78451d2a02d78 to reach a good state
> Machine e78451d2a02d78 reached started state
> Running smoke checks on machine e78451d2a02d78
> Running machine checks on machine e78451d2a02d78
> Checking health of machine e78451d2a02d78
Unrecoverable error: timeout reached waiting for health checks to pass for machine e78451d2a02d78: failed to get VM e78451d2a02d78: Get “https://api.machines.dev/v1/apps/keimimansai-shift/machines/e78451d2a02d78”: net/http: request canceled
> Clearing lease for e78451d2a02d78
Cleared lease for e78451d2a02d78
Error: failed to update machine e78451d2a02d78: Unrecoverable error: timeout reached waiting for health checks to pass for machine e78451d2a02d78: failed to get VM e78451d2a02d78: Get “https://api.machines.dev/v1/apps/keimimansai-shift/machines/e78451d2a02d78”: net/http: request canceled
```
## Symptom 2: `flyctl checks list` output is frozen/stale
```
Health Checks for keimimansai-shift
NAME │ STATUS │ MACHINE │ LAST UPDATED │ OUTPUT
───────────────────────────┼──────────┼────────────────┼──────────────┼─────────────────────────────
servicecheck-00-http-3000 │ critical │ e78451d2a02d78 │ 1h45m ago │ connect: connection refused
```
The `STATUS`/`OUTPUT` here have not changed across all 6 deploy attempts and
2 different `http_service.checks` configurations — only the “ago” duration
advances by real elapsed time, suggesting this health-check record is stuck
and not being re-evaluated.
## Symptom 3: Public URL times out completely
```
curl -v --max-time 20 https://keimimansai-shift.fly.dev/
```
TLS handshake completes, request is sent, but 0 bytes are received before a
20s timeout. fly-proxy does not appear to be routing traffic to the machine.
## What I’ve ruled out
- **Not a local network issue**: `curl https://api.machines.dev/v1/apps/keimimansai-shift`
(unauthenticated) from the same machine/network returns HTTP 401 in ~0.6s.
General connectivity to `api.machines.dev` is fine.
- **Not an app-level issue**: via `flyctl ssh console`, confirmed the app
process is listening on `0.0.0.0:3000` (via `/proc/net/tcp`, local address
`00000000:0BB8`, state `0A`/LISTEN) and responds `HTTP 307` to
- **Not an `http_service` config issue**: `fly.toml` is standard —
`internal_port = 3000`, `force_https = true`, one `[[http_service.checks]]`
with `method = “GET”`, `path = “/”`, `interval = “30s”`, `timeout = “5s”`,
`grace_period = “10s”`. Tried both `-H 0.0.0.0` and `-H ::` in the Dockerfile
CMD — same result either way (currently reverted to `-H 0.0.0.0`, which is
the address Fly’s own socket-scan warning recommends).
- **Not a broad platform incident**: status.flyio.net shows all systems
operational, NRT region 100% uptime over 90 days, no related incidents for
Machines API / health checks / fly-proxy.
## Timeline
- ~15:49 JST: health check first observed `critical` / `connection refused`,
has not changed since.
- Since then: 6× `flyctl deploy` (full image rebuild + rolling update each
time, now at release v6) and several `flyctl machine restart` — each
restart succeeds (machine restarts, file timestamps update), but the
CLI’s health-wait never returns (had to be cancelled).
- Across this whole window, the app has remained internally healthy
(verified via SSH + direct HTTP request to 127.0.0.1:3000 each time).
## Question
Could this be a stuck health-check evaluator or fly-proxy route registration
specific to this machine (`e78451d2a02d78`)? Is there a way to force
fly-proxy / the health-check system to re-register/re-evaluate for this
machine without recreating it (recreating risks duplicating the attached
volume `shift_app_data` mounted at `/data`, which holds a SQLite database I’d
rather not fork)?
Any guidance on unsticking this would be appreciated.
Update: I tried the most invasive self-service fix I could think of — destroying the machine entirely and letting flyctl deploy recreate it from scratch (volume vol_r1jy25z7gl0yxowr survives independently of the machine, so the SQLite data on /data wasn’t at risk).
flyctl machine destroy e78451d2a02d78 --force
# volume vol_r1jy25z7gl0yxowr now shows ATTACHED VM: (empty)
flyctl deploy
Result: a brand-new machine 9080d396c77268 (release v7) was created, the volume auto-reattached correctly (matched on [[mounts]] source = "shift_app_data"), and /data/dev.db is intact (409600 bytes, confirmed via SSH).
But the exact same failure reproduced within ~2 minutes on this brand-new machine ID:
2026-06-12T10:02:40Z health[9080d396c77268] nrt [error]Health check 'servicecheck-00-http-3000' on port 3000 has failed. Your app is not responding properly. ...
2026-06-12T10:02:42Z app[9080d396c77268] nrt [info]✓ Ready in 313ms
The health check fired and recorded critical / connection refused before the app even logged “Ready” (by ~2 seconds), and then — same as the old machine — it never re-evaluated again:
NAME │ STATUS │ MACHINE │ LAST UPDATED │ OUTPUT
servicecheck-00-http-3000 │ critical │ 9080d396c77268 │ 6m12s ago │ connect: connection refused
flyctl deploy failed with the identical error, just with the new machine ID:
✖ Failed: timeout reached waiting for health checks to pass for machine 9080d396c77268: failed to get VM 9080d396c77268: Get "https://api.machines.dev/v1/apps/keimimansai-shift/machines/9080d396c77268": net/http: request canceled
Public URL is still 100% unreachable (0 bytes / 15s timeout), while SSH + a local http://127.0.0.1:3000/ request from inside the new machine still returns HTTP 307 as expected.
Given this reproduced on a completely fresh machine ID within ~2 minutes, I don’t think this is a stuck-machine-record issue anymore — it looks like the health check is only ever evaluated once (at the moment the machine starts, before the app is ready to accept connections) and is never retried for this app, and/or the authenticated GET /v1/apps/keimimansai-shift/machines/{id} endpoint is failing for this app/org specifically (an unauthenticated GET /v1/apps/keimimansai-shift from the same network succeeds in <1s).
App: keimimansai-shift, current machine: 9080d396c77268 (nrt). Happy to provide any additional logs/IDs — at this point this feels like it needs a look from the platform side.