50% connectivity failure rate on apps

Hi!

I seem to be seeing about a 30% failure rate when trying to connect to my application running in fly. Specifically for an app I spun up to debug this issue I have the following statistics:

failures — 391
success – 777

I’m specifically seeing the following error:

curl: (56) Recv failure: Connection reset by peer

Is there something I have to do to resolve this?

Can you run curl -v and include full output here?

There are a number of things that might cause this:

  1. If the app is crashing, we’ll restart it but requests will fail for that time period
  2. Health check failures will prevent traffic from being sent VMs
  3. If the VM drops the connection or takes too long to respond

You can check for the first two by running fly status. If the restart count is greater than zero, or if health checks aren’t passing, run fly vm status <id> to get more details.

If our Proxy can’t get a response from your app, we’ll log an error message. Run fly logs to check for those.

hi @kurt thanks for the response!

It doesn’t look like the app is crashing -

App
  Name     = fail-test
  Owner    = personal
  Version  = 0
  Status   = running
  Hostname = fail-test.fly.dev

Instances
ID       PROCESS VERSION REGION DESIRED STATUS  HEALTH CHECKS      RESTARTS CREATED
f4445e3d app     0       iad    run     running 2 total, 2 passing 0        17h0m ago

I also don’t see any healthcheck failures in the logs beyond the initial one on startup.

How do I get the value of <id> for the vm status?

(as a note, i have destroyed and re-created this app a few times so you may see something different than the above if you look at the current version)

We’re missing a bunch of information we’ll need to help debug this. What does curl -v show? Also what is in your fly.toml and what kind of app is this?

The instance ID is the first column in the output of fly status.

here is the fly.toml:

app = "fail-test"

kill_signal = "SIGINT"
kill_timeout = 5

[build]
  image = "registry.fly.io/squid:latest"

[env]
  abcwe123 = "abcwe123"

[[services]]
  internal_port = 5000
  protocol = "tcp"

  [services.concurrency]
    hard_limit = 1000
    soft_limit = 1000

  [[services.ports]]
    port = "5000"

  [[services.tcp_checks]]
    grace_period = "5s"
    interval = "20s"
    port = "5000"
    restart_limit = 5
    timeout = "2s"

[[services]]
    internal_port = 5001
    protocol = "tcp"

  [services.concurrency]
    hard_limit = 1000
    soft_limit = 1000

  [[services.ports]]
    port = "25565"

  [[services.tcp_checks]]
    grace_period = "5s"
    interval = "20s"
    port = "5001"
    restart_limit = 5
    timeout = "2s"

[mounts]
  source="fail_test"
  destination="/data"

# this allows us to talk to the box directly
[experimental]
allowed_public_ports = [5000]

Unfortunately, I’m not able to reproduce it this morning in order to get a curl -v for you. I’ll do a bunch more testing and see what I can get for you.

1 Like