Proxy connection issues

We have 2 instances (min) deployed to 3 regions: den, dfw, ord

The machines are dedicated-cpu-1x

The app is a simple express nodejs app that just returns 3 environment variables:

app.get("/", (req, res) => {
  res.type("json");
  res.send(
    JSON.stringify(
      {
        name: process.env.FLY_APP_NAME,
        id: process.env.FLY_ALLOC_ID,
        region: process.env.FLY_REGION ?? "no region",
      },
      null,
      2
    )
  );
});

https://usc-0beec7.fly.dev/

We are trying to run and validate some auto scale parameters and basic functionality.

The load test is using apache bench:

ab -n 1000 -c 10 https://usc-0beec7.fly.dev/

We get failed requests when running the load requests (sometimes 3-4, sometimes more). In the logs, we see this message the same number of times for the failed connections:

Error: error while making HTTP request to app: connection closed before message completed

❯ fly status -a usc-0beec7
App
  Name     = usc-0beec7
  Owner    = shopmonkey
  Version  = 13
  Status   = running
  Hostname = usc-0beec7.fly.dev
  Platform = nomad

Deployment Status
  ID          = 28654c55-5ae7-39fa-08ad-4453e4cc4a89
  Version     = v13
  Status      = successful
  Description = Deployment completed successfully
  Instances   = 2 desired, 2 placed, 2 healthy, 0 unhealthy

Instances
ID      	PROCESS	VERSION	REGION	DESIRED	STATUS 	HEALTH CHECKS     	RESTARTS	CREATED
24b7c10e	app    	13     	den   	run    	running	1 total, 1 passing	0       	5m49s ago
af20906e	app    	13     	dfw   	run    	running	1 total, 1 passing	0       	6m46s ago

We aren’t sure if and how we should move forward. Our plan is to deploy a fairly significant SaaS app across a minimum of 10 regions if we can get this working well. Any help appreciated.

1 Like

Sometimes when i run it just fails even with 1 concurrent user:

❯ ab -n 1000 -c 1 https://usc-0beec7.fly.dev/
This is ApacheBench, Version 2.3 <$Revision: 1901567 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking usc-0beec7.fly.dev (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
apr_pollset_poll: The timeout specified has expired (70007)
Total of 722 requests completed

ran it again…

❯ ab -n 1000 -c 1 https://usc-0beec7.fly.dev/
This is ApacheBench, Version 2.3 <$Revision: 1901567 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking usc-0beec7.fly.dev (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
apr_pollset_poll: The timeout specified has expired (70007)
Total of 655 requests completed

This one is for the Fly engs to answer.


I have a suggestion if you want to narrow this problem as being down to Fly’s proxy:

There must be a FLY_PUBLIC_IP env var set for each VM. You can connect to this IP over allowed_public_ports bypassing Fly’s proxies. To do so:

  1. Add an experimental.allowed_public_ports=[<app-internal-port] to your fly.toml: Any way to attach a public IPv4 address to a specific machine? - #4 by ignoramous
  2. Exec ab against that IP:port.
ab -n 10 -c 10 -H "Host: <FILL-IF-REQUIRED>" "http://$FLY_PUBLIC_IP:$INTERNAL_PORT"

To hasten up, consider writing to their support if you are okay subscribing to their Launch plans (ÂŁ29/mo) or Scale plans (ÂŁ200/mo): Announcing: Plans for email support and compliance requirements (usage included).

I’m looking into this a little bit.

Autoscaling doesn’t work too well for benchmarks in its current form. It’s too slow to react. It shouldn’t cause the issues you noticed, but I wouldn’t recommend it for this kind of workload.

We’re working on a better autoscaling mechanism for Apps V2.

That usually means your server closed the connection before fully responding. From the code you provided, that sounds unlikely.

Any way you can get more logs from your app?

I ran ab and all the requests got through but I received 12 non-2xx responses:

❯ ab -n 1000 -c 10 https://usc-0beec7.fly.dev/
# ...
Benchmarking usc-0beec7.fly.dev (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests
# ...
Concurrency Level:      10
Time taken for tests:   16.489 seconds
Complete requests:      1000
Failed requests:        12
   (Connect: 0, Receive: 0, Length: 12, Exceptions: 0)
Non-2xx responses:      12
Total transferred:      458676 bytes
HTML transferred:       91884 bytes
Requests per second:    60.65 [#/sec] (mean)
Time per request:       164.888 [ms] (mean)
Time per request:       16.489 [ms] (mean, across all concurrent requests)
Transfer rate:          27.17 [Kbytes/sec] received

I’ve ran a different tool too (oha) and only got 200 response codes:

❯ oha -n 1000 -c 10 https://usc-0beec7.fly.dev/
Summary:
  Success rate: 1.0000
  Total:        6.1906 secs
  Slowest:      3.6174 secs
  Fastest:      0.0318 secs
  Average:      0.0617 secs
  Requests/sec: 161.5363

  Total data:   99.61 KiB
  Size/request: 102 B
  Size/sec:     16.09 KiB

Response time histogram:
  0.046 [590] |â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– 
  0.060 [370] |â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– â– 
  0.074 [23]  |â– 
  0.088 [7]   |
  0.101 [0]   |
  0.115 [0]   |
  0.129 [0]   |
  0.143 [0]   |
  0.157 [0]   |
  0.171 [0]   |
  0.185 [10]  |

Latency distribution:
  10% in 0.0392 secs
  25% in 0.0419 secs
  50% in 0.0446 secs
  75% in 0.0484 secs
  90% in 0.0537 secs
  95% in 0.0579 secs
  99% in 0.3196 secs

Details (average, fastest, slowest):
  DNS+dialup:   1.5787 secs, 0.2486 secs, 3.5748 secs
  DNS-lookup:   1.4817 secs, 0.1159 secs, 3.5127 secs

Status code distribution:
  [200] 1000 responses
1 Like