Custom inbound TCP port & long living connections

Wouter · February 22, 2021, 12:35pm

Hi Fly team!

I’m looking into using Fly.io as a more advanced load balancer for my video service. Specifically i want to accept RTMP streams on port 1935 and proxy them to my video servers among cloud providers. In the docs i see i can use ports 80, 443, 5000, and 10000 - 10100, is there any configuration possible for 1935 or others?

Besides that i’m expecting incoming connections for multiple hours for the video streams. I know Fly can close connections when downscaling etc. Any other reasons? Or should long-living connections be fine in general?

Thanks!

jerome · February 22, 2021, 12:58pm

We’d love to see how a video streaming service performs on our platform!

We’re working on allowing many more ports. For now it’s on a case-by-case basis.

TCP connections are not closed unless:

We deploy our proxy (for now, until we figure out a way to hand off sockets between processes). However, they’ll have a fairly long graceful shutdown timeout.
Scaling down your service. This will also allow a decent amount of time for a graceful shutdown, but it’s shorter.

Wouter · February 22, 2021, 1:15pm

Thanks @jerome!

So potentially this would be possible if we discuss all the details?

How often does this happen? What range of long are we talking about? Sometimes i’m ingesting livestreams up to 12 hours. If i would have multiple dis- and re-connects on every stream then my customers wouldn’t be happy.

Any docs on this? Would be happy to pay for the extra shutdown time. Would be ideal if fly takes the instance out of rotation for new connections, but keeps it around until the app indicates connections are closed.

jerome · February 22, 2021, 1:27pm

Yes

It varies a lot. Usually only once or twice a day, but it can go untouch for weeks if we don’t have any new features touching it. Sometimes we need to restart it, but that’s also rare.

We’ll wait up to 4 minutes for a connection to gracefully close when we shut down our proxy.

When we trigger the shutdown, the new proxy instance is already healthy and accepting connections, so a reconnect should be instantaneous.

This is currently configurable as kill_timeout. We allow up to 24 hours of waiting for you VM to shutdown before force killing it. However, the proxy won’t wait 24 hours before closing a connection.

We could change that. It would make sense to not use a timeout there and actually wait for the VM to exit which will cause the connection to be severed anyway. I still like to have timeouts on everything though! Maybe we can set a idle timeout on read/write on the connection when this happens. We do not use a idle timeout on “tcp” services (services that don’t have a http handler, which I’m assuming is what you’ll be using).

Wouter · February 22, 2021, 1:55pm

Awesome!

I understand. Minimal interruptive for most apps. But bottom-line for me is that i will have streams interrupted regularly. The 4 minute grace-time wont matter since the streams are longer in general.

Most broadcast software have auto-reconnect delays of 10 seconds. So that will be 10 seconds “lost” on event livestreams, sports broadcasts, etc.

In my case the challenge is the proxy closing the connection as far as i understand.

The proxy waiting for the VM would fix it in my case. Would the proxy then block new requests?

Makes sense to still do timeouts. I’m indeed using TCP, so that would be problematic in this case as far as i understand?

jerome · February 22, 2021, 2:07pm

It’s possible for our proxy to hand over sockets, but I’ve not implemented that logic. We’d be willing to look into it and make your use-case work. It would also benefit everyone, of course.

Can you configure it to reconnect faster? This is not a permanent solution, but it would help. Losing 500ms vs losing 10s is a big difference. I watch a lot of Twitch streams myself

It is, but you’ll also need to set the kill_timeout as higher as we’ll allow.

The proxy would stop sending new connections to your “shutting down” instance. It would dispatch new connections to difference instances.

It shouldn’t be problematic afaict. There will always be data being read from / written to the connection, so we won’t shut it down! Our idle timeout mechanism will reset its deadline every time data is sent or received (or buffered by the kernel). Since your connections will keep sending or receiving data, then it won’t trigger the idle timeout. This would only be a measure we put in place for other services that might keep a connection open indefinitely.

Wouter · February 22, 2021, 2:26pm

Would be amazing indeed!

The broadcast software is under control of my clients. I could tell all of them to configure them with a smaller reconnect window in case of disconnects, but that does not inspire confidence of my clients in my infrastructure

Shared VMs can extend it to 300 seconds (five minutes). Dedicated VMs can extend the timeout to 86400 seconds (twenty-four hours).

Any higher kill_timeouts possible for shared VM’s? Dedicated VM’s are at this point overkill for TCP load balancing of a limited amount of incoming connections. Would prefer a multitude of shared VM’s instead of one dedicated VM.

Makes sense!

getkey · January 15, 2023, 9:49pm

Any higher kill_timeouts possible for shared VM’s? Dedicated VM’s are at this point overkill for TCP load balancing of a limited amount of incoming connections. Would prefer a multitude of shared VM’s instead of one dedicated VM.

I would also like that. I run game servers that are small enough for shared hardware (each VM should ideally have between 10 and 25 players = websocket connection).

It’s nice to have VMs spin up automatically when and where my players are awake, but I can’t kick out everyone after 5 minutes of a server being underfilled! Something like 1 hour seems more reasonable, but I can’t afford dedicated VMs.

ignoramous · January 15, 2023, 11:48pm

Make use of Machines for that (blog, docs)?

getkey · January 18, 2023, 4:49pm

That’s a lot of complexity (and code) that I was happy to avoid with the regular apps. On the other hand, I guess it’s still less complexity than k8s…

ignoramous · January 18, 2023, 5:28pm

Agree that Machines are too low-level, but they’re fun since as a developer, you get full control.

I guess you’d want to wait until Fly implements autoscale for Machines. Might be a while, though.

Topic		Replies	Views
TCP idle timeouts restrictions have been removed Fresh Produce	2	845	September 2, 2023
Request timeouts on fly.io Questions / Help	10	3487	May 19, 2023
Idle TCP connection with data terminates after 30 seconds Questions / Help streams , proxy	1	106	June 26, 2024
Long-lived TCP connections are dropped Questions / Help streams , proxy	5	1551	February 5, 2025
How many long-lived websocket connections can I have? Questions / Help	5	1478	March 29, 2024

Custom inbound TCP port & long living connections

Related topics