We’ve removed our idle timeout detection → closing mechanism for TCP connections. It used to serve as a way to detect not-fully-closed-but-dropped connections. Instead, we have properly setup various socket options w/ the kernel to shutdown connections in CLOSE_WAIT
or FIN_WAIT
states.
For the vast majority of our users, this will be a positive change: long-lived connections won’t be closed anymore when there’s no read/write activity happening on them. This is particularly useful for database connections that go through our proxy, but also for websockets and long-polling requests.
The change has been rolled out Today at around 12PM UTC.
We’ve already noticed some apps reaching edge concurrency limits. That limit is presently set at 2048 concurrent connections per edge per app. We have hundreds of edges, so unless there’s a large spike in traffic for your app in a single region, this shouldn’t be a problem. However, some apps do not properly close their connections leading to an accumulation up the chain when connections map 1-to-1 (e.g. if your app is setup to use “connections”-type concurrency).
We’re adding new log messages coming from our proxy to let you know when we’re shedding load for your app, for example:
Dropped a connection destined for tcp/443 due to a full backlog for your app. We’ll only notify again every power of 10 (hint: check if your app is closing connections properly when it’s done with them)
Note: Rate limiting can also trigger connections to be dropped at the edge.
Both connection drops and rate limitations can be observed via the following metrics:
fly_edge_tcp_connections_drop_count
fly_edge_tcp_rate_limited_count
The fly_edge_tcp_rate_limited_count
includes a “type” label representing which can take a value of either tls
or connections
.
If you are closing your connections properly, still getting connections dropped and that traffic is legit, then we can certainly change the limits for your app!