Health check 'servicecheck-01-http-3001' on port 3001 has failed. Your app is not responding properly

It’s for a Rust backend, and I have extensive logs throughout the app, but I’m not seeing any sort of errors happening there. It could even be, and it even happens if it’s idle.

Of course, performance degrades as a result. I have restarts configured as a safety net.

``toml
# Fly.io Application Configuration

# This file is used as a template - actual app name is set via --app flag during deployment

primary_region = “iad”

[build]

[deploy]

release_command = “/usr/local/bin/migrate”

[env]

PORT = “3001”

HOST = “0.0.0.0”

STATIC_DIR = “/app/static”

[[services]]

protocol = “tcp”

internal_port = 3001

processes = [“app”]

min_machines_running = 1 # Keep minimum 1 machine per region (2 total for iad + ams)
#
do I need more than one per region?

[services.concurrency]

type = "connections"

hard_limit = 250

soft_limit = 200

# HTTP port (redirects to HTTPS)

[[services.ports]]

port = 80

handlers = [“http”]

force_https = true

# HTTPS port

[[services.ports]]

port = 443

handlers = [“http”, “tls”]

# Health check configuration

# If health checks fail consecutively, the machine will be restarted

# Higher restart_limit = more tolerant of transient issues, but slower to recover from persistent problems

[[services.tcp_checks]]

interval = "15s"

timeout = "2s"

grace_period = "10s"

restart_limit = 5  *# Restart after 5 consecutive TCP check failures (75 seconds)*

# This gives transient network issues time to resolve before restarting

# HTTP health check - monitors actual application health including database connectivity

[[services.http_checks]]

interval = "10s"

timeout = "2s"

grace_period = "10s"

method = "get"

path = "/api/health"

protocol = "http"

restart_limit = 6  *# Restart after 6 consecutive HTTP check failures (60 seconds)*

# 60 seconds allows transient DB connection issues to resolve, but still catches persistent problems

# VM resources

# Production: performance-cpu-1x@2048MB (configured here)

# Staging: Override with --vm-cpu-kind=shared --vm-memory=256 in CI/CD workflow

[vm]

cpu_kind = “performance”

cpus = 1

memory_mb = 2048

# Restart policy

# Controls restarts when the process exits with an error

# Health check failures trigger restarts separately via restart_limit above

[[restart_policy]]

processes = [“app”]

policy = “on-failure”

max_retries = 5 # Increased retries for process crashes (separate from health check restarts)

```

Sounds like the DB connection is the bottle neck?

got some kind of log that could help today:

2026-01-13T15:20:47.089149Z ERROR comments_wall_backend::queue_processor: Error processing events: Broken pipe (os error 32)

Did not restart either.

Then the machine status shows this.

context deadline exceeded (Client.Timeout exceeded while awaiting headers)

I tried scaling to multiple machines and then multiple machines in the same region, and the error still pops up every now and again without a clear reason why.

Closing this out. I had a blocking synchronous call

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.