Traffic is routing to regions without servers

I have an Nginx proxy application terminating SSL for a bunch of clients. I’m dealing with what I think is a DDoS on our application.

I’ve implemented rate limiting in Nginx and it seems to be working for the most part. However, I noticed that a lot of traffic is getting rejected because it’s being routed to regions that are not enabled for our app.

For example, the app is enabled in dfw, ewr, sea (primary) and atl, iad, lax, ord, sjc, vin (backup). But I’m seeing errors in the logs like:

2021-09-01T13:03:39.012416927Z proxy[00be3f08] chi [error] error.code=2002 error.message="App connection problem" request.method="POST" request.url="""01FEGQADYPZ0P6GKKZGDNFG33G" response.status=502

2021-09-01T13:03:39.001997042Z proxy[00be3f08] dfw [error] error.code=2002 error.message="App connection problem" request.method="POST" request.url="""01FEGQADZD19QMEKJNRYWXVZZA" response.status=502

Is that expected behaviour? Should I be checking where traffic is being routed and enabling those regions?

Those logs are a little confusing, the region there is the region where we received the connection. The 00be3f08 value is the app instance we tried to send it to. If you run fly status you should be able to see which app instance and region had a connection failure.

Gotcha, thanks Kurt. I was able to reconcile the app instance identifiers with the logs.

So this is most likely that Nginx is returning a 502 somehow. My rate limiting is configured to return a 444 code and I’m not seeing any 502 / Bad Gateway errors from my app server (Cloud Run).

Anyway, thanks for the speedy assistance.

This particular 502 is coming from our proxy, it happens when we can’t get a response from your app (nginx in this case).

What does the concurrency setting in your [services] config look like? If you’re getting a good amount of traffic, it’s worth adding type = "requests" to that block. That will make our proxy use a connection pool, rather than creating a new connection to nginx each time.

Right, so I’m probably hitting the connection limit? Currently:

    hard_limit = 250
    soft_limit = 200

So I can update to this? (I don’t see type = "requests" in the docs so just want to check.

    hard_limit = 250
    soft_limit = 200
    type = "requests"

type=requests didn’t seem to make much difference. So I bumped up the hard and soft limits too. Also no improvement.

I tried scaling but flyctl scale count 5 seems to give Error 5 is not a valid process=count option.

Try fly scale count app=5, this is a regression in our CLI (the next release should fix it).

Thanks, that worked.

Seems the errors I referenced in my first post disappear if respond to rate limited requests with a 503 (the default) rather than 444 which I had set via limit_req_status 444;.

Well that’s strange. We just pass responses through as is, I am surprised it made a difference!