TCP idle timeouts restrictions have been removed

We’ve removed our idle timeout detection → closing mechanism for TCP connections. It used to serve as a way to detect not-fully-closed-but-dropped connections. Instead, we have properly setup various socket options w/ the kernel to shutdown connections in CLOSE_WAIT or FIN_WAIT states.

For the vast majority of our users, this will be a positive change: long-lived connections won’t be closed anymore when there’s no read/write activity happening on them. This is particularly useful for database connections that go through our proxy, but also for websockets and long-polling requests.

The change has been rolled out Today at around 12PM UTC.

We’ve already noticed some apps reaching edge concurrency limits. That limit is presently set at 2048 concurrent connections per edge per app. We have hundreds of edges, so unless there’s a large spike in traffic for your app in a single region, this shouldn’t be a problem. However, some apps do not properly close their connections leading to an accumulation up the chain when connections map 1-to-1 (e.g. if your app is setup to use “connections”-type concurrency).

We’re adding new log messages coming from our proxy to let you know when we’re shedding load for your app, for example:

Dropped a connection destined for tcp/443 due to a full backlog for your app. We’ll only notify again every power of 10 (hint: check if your app is closing connections properly when it’s done with them)

Note: Rate limiting can also trigger connections to be dropped at the edge.

Both connection drops and rate limitations can be observed via the following metrics:

fly_edge_tcp_connections_drop_count
fly_edge_tcp_rate_limited_count

The fly_edge_tcp_rate_limited_count includes a “type” label representing which can take a value of either tls or connections.

If you are closing your connections properly, still getting connections dropped and that traffic is legit, then we can certainly change the limits for your app!

6 Likes

Nice. The 2048 TCP limit could be mentioned in lb docs?

Curious: As someone who’s seen these on our servers, just what sockopts are you setting to rid of these automatically?

1 Like

We probably should, yes. I think these will be more dynamic in the future, but they’ve bet set that way for over 2 years now.

Combination of SO_KEEPALIVE, TCP_KEEPIDLE, TCP_KEEPINTVL, TCP_KEEPCNT in addition to TCP_USER_TIMEOUT. There are more details and examples here

3 Likes