Is it possible to increase the timeout to 120 sec

nickolay.loshkarev · November 1, 2021, 1:31pm

I have an concordia-production-web app with a request that can take over 60 seconds. Is it possible to increase the timeout to 120 sec

sudhir.j · November 1, 2021, 1:38pm

@nickolay.loshkarev Is this request impossible to make into a streaming request, like a websocket, server sent event or even just a streamed response (you can send the headers first and then body later).

We can’t really increase the timeout directly, but I’m happy to help you figure out a way to start sending a response within a few seconds so Fly knows you’re going to accept the connection, and then you can take much longer than 60 seconds as long as you’re streaming data out.

If you can share you tech stack and some info about your particular use case I’m happy to help get it working.

sudhir.j · November 1, 2021, 1:44pm

We can make this change for customers on our Enterprise plan, though, so if we’re not able to make it work that is still an option. But most HTTP frameworks support some level of streaming, so that’s unlikely to be necessary.

ignoramous · November 1, 2021, 5:41pm

Wait: What’s an ‘enterprise’ plan? Doesn’t kurt despise cloudflare much for their ‘enterprise-plan only’ shenanigans…

Also, do the 120s timeout (are they read or connection timeouts?) apply to even raw tcp conns OR just the http ones fly proxies and the tls ones fly terminates?

What’s the timeout for udp nat?

jerome · November 1, 2021, 5:53pm

We have an internal setting for controlling the timeout for customers with specific needs.

We do not like enterprise features like that, but we do need to protect our resources. Open file descriptors can be exhausted if a lot of connections are left open on the same hosts.

Bandwidth is a different game. We have fairly large pipes on all our hosts, we should be able to handle bandwidth requirements for most apps, the pricing is pretty clear on that

The “enterprise” plan we’re thinking up concerns supports (via email, including a way to reach us faster during emergencies) and more relaxed limits for app instances (such as the timeout we’re talking about here, but also things like connection backlogs, concurrent TLS handshakes, etc.). Basically, dedicated resources for the proxy, as opposed to the current multitenant-only model.

These plans aren’t ready yet! Just thinking about them.

It’s 60 seconds.

Applies to both HTTP and raw TCP. They’re “idle” timeouts. If no data is read or written within 60s, the connection is forcefully closed. It applies for both edge connections and connections we make to your app instances.

UDP is connection-less and doesn’t require timeouts.

ignoramous · November 1, 2021, 9:16pm

Thanks for the detailed answer. I was only taking a dig. Sure, AWS-esque Enterprise support makes sense. All top paying customers even had direct access to engs on our team (at AWS) who were free to spend weeks addressing just that one customer’s need.

Re: UDP: Apologies. I should have been clearer what I meant by timeout for UDP NAT.

When the data sent by the client is fragmented over multiple UDP packets (as with large DNS responses, or even WireGuard for that matter), it’s not ideal to have those fragments routed to different servers. Take QUIC (rfc9000), for instance: Http requests/responses are likely to be fragmented over multiple UDP packets, where the state of the http connection, as it were, is maintained by the application.

The recommended timeout to keep a map of UDP flows around is 5mins (rfc4787 req5) though most routers only keep them for 30s, some for 2mins. My query was, what’s the timeout at fly’s load balancer (fly-lb)? Or, am I gravely mistaken and fly-lb doesn’t need to track UDP flows?

kurt · November 1, 2021, 10:00pm

We don’t track UDP flows, exactly. We just send UDP packets to the nearest VM. The expectation is that you’ll do UDP load balancing from there.

If your VM dies, or you deploy a new app, you can “break” a stateless connection implemented over UDP.

ignoramous · November 1, 2021, 10:19pm

Thanks. IIUC, if there are 3 VMs, say, in the same region, any of those 3 could get sent UDP packets, even if sent by the same client ip-port?

If so, that might break things unless the VMs are careful handling such traffic. Does fly intend to sticky route UDP packets anytime in the near future?

kurt · November 1, 2021, 10:22pm

If there are 3 VMs alive, each of our edge servers gets a “pinned” VM and sends every UDP packet it receives to the same app VM. It’s dumber than you’d think, it literally just sends the packet to the first VM it sees in a sorted list.

Most people run one VM per region for UDP work. So far we haven’t had anyone need to scale out to support giant amounts of UDP traffic, even using the one VM to do protocol / app specific balancing has been enough.

ignoramous · November 1, 2021, 10:46pm

Between DNS, WireGuard, and http/3 our workloads are overwhelmingly UDP. But our scale isn’t earth shattering to matter much.

Pinning traffic to VMs does help. I just hope all edge servers don’t send traffic to the same VM. It would be cool for edge to distribute the load as in a consistent-hashing / maglev-hashing scheme. Some day…

Still: We could impl such a router in our VMs to steer traffic as we please via fly’s 6pn, hashing on client’s ip-port, say. But: Is the assumption that edge servers preserve client ip-port when forwarding packets, true?
Do gossip over 6pn within the same region count towards egress bandwidth?

matthewford · November 2, 2021, 12:43pm

Hi @sudhir.j we’re migrating a larger old rails app from AWS, some actions will take time for us to rewrite like generating pdfs.

We’re using rails and the wicked_pdf gem and our pdfs use the built in pdf format renderer:


format.pdf do
  render :pdf => "#{@ledger.actor.name}_ledger",
         :encoding => 'utf8'
end

Changing this to stream would require a fair amount of work on the upstream library, in the interim if we could change the concordia account apps to a 120s timeout that would really help us go live sooner.

ignoramous · November 2, 2021, 4:49pm

Not a ruby expert, but I believe from what @jerome wrote above, sending keep-alives should leave the connection intact… Sending TCP keepalives in Ruby - makandra dev If not, you’d then need an explicit timeout increase specific for your app / account.

sudhir.j · November 2, 2021, 4:57pm

The render :pdf complicates things a bit. Rails does support streaming, so sending the headers first should tell the both the proxy and the browser that something is coming. There probably is a way to do this by doing streaming or hijacking the connection directly, but I don’t have enough context to suggest one just yet.

I wonder what the timeout is in browsers, though — it seems like they’d give up long before Fly, but I’m not sure what value is set on each of them.

matthewford · November 3, 2021, 12:45pm

Think it’s around 5 mins in chrome for the browser timeout.

We have another app we’re migrating for a client that also might need a timeout bump, but this time i’m not sure we could stream it if we wanted to due to the time is takes to calculate the data, perhaps we could stream something, while we wait for the actual data.

sudhir.j · November 3, 2021, 1:12pm

Yeah, there’s also some options depending on what kind of request it is.

A file or report download could stream CSVs as you go, a if you’re making a zip file out of smaller files you’re especially in luck, the zip format is very stream-y and I actually managed to make a streamer for zips GitHub - sudhirj/zippo: Get a zip full of a list of file URLs provided.

PDFs are difficult, I don’t think it’s possible to send anything out until the PDF is done.

HTML can be streamed by sending the headers first, Rails has a streaming system. Elixir/Phoenix etc can handle this really easily as well.

SSEs and web sockets are of course the easiest, can open the connection and send a ping once every few seconds until the data is ready.

If you have a job system, you can send in a loading indicator HTML page with a “refresh” meta tag on it so it keeps reloading every N seconds, and you can check the job status each time and render the result when it’s ready.

Lots of ways to handle this, but yeah, if this is human facing I’d be pretty surprised if people sit quietly for that long and nothing happens in between. API calls will tend to have a timeout on the client anyway. So either way I’d be quite surprised at a long term solution that just kept a client waiting for so long with no indicators.

Topic		Replies	Views
Increasing idle timeout Questions / Help	2	783	September 23, 2021
Request timeout at 30s, can I increase it? Questions / Help	2	1121	May 19, 2023
Request timeouts on fly.io Questions / Help	10	3521	May 19, 2023
60 second timeout for HTTP server Questions / Help	8	4360	April 20, 2022
Long-lived TCP connections are dropped Questions / Help streams , proxy	5	1555	February 5, 2025

Is it possible to increase the timeout to 120 sec

Related topics