I’m looking into using Fly.io as a more advanced load balancer for my video service. Specifically i want to accept RTMP streams on port 1935 and proxy them to my video servers among cloud providers. In the docs i see i can use ports 80, 443, 5000, and 10000 - 10100, is there any configuration possible for 1935 or others?
Besides that i’m expecting incoming connections for multiple hours for the video streams. I know Fly can close connections when downscaling etc. Any other reasons? Or should long-living connections be fine in general?
We’d love to see how a video streaming service performs on our platform!
We’re working on allowing many more ports. For now it’s on a case-by-case basis.
TCP connections are not closed unless:
We deploy our proxy (for now, until we figure out a way to hand off sockets between processes). However, they’ll have a fairly long graceful shutdown timeout.
Scaling down your service. This will also allow a decent amount of time for a graceful shutdown, but it’s shorter.
So potentially this would be possible if we discuss all the details?
How often does this happen? What range of long are we talking about? Sometimes i’m ingesting livestreams up to 12 hours. If i would have multiple dis- and re-connects on every stream then my customers wouldn’t be happy.
Any docs on this? Would be happy to pay for the extra shutdown time. Would be ideal if fly takes the instance out of rotation for new connections, but keeps it around until the app indicates connections are closed.
It varies a lot. Usually only once or twice a day, but it can go untouch for weeks if we don’t have any new features touching it. Sometimes we need to restart it, but that’s also rare.
We’ll wait up to 4 minutes for a connection to gracefully close when we shut down our proxy.
When we trigger the shutdown, the new proxy instance is already healthy and accepting connections, so a reconnect should be instantaneous.
This is currently configurable as kill_timeout. We allow up to 24 hours of waiting for you VM to shutdown before force killing it. However, the proxy won’t wait 24 hours before closing a connection.
We could change that. It would make sense to not use a timeout there and actually wait for the VM to exit which will cause the connection to be severed anyway. I still like to have timeouts on everything though! Maybe we can set a idle timeout on read/write on the connection when this happens. We do not use a idle timeout on “tcp” services (services that don’t have a http handler, which I’m assuming is what you’ll be using).
I understand. Minimal interruptive for most apps. But bottom-line for me is that i will have streams interrupted regularly. The 4 minute grace-time wont matter since the streams are longer in general.
Most broadcast software have auto-reconnect delays of 10 seconds. So that will be 10 seconds “lost” on event livestreams, sports broadcasts, etc.
In my case the challenge is the proxy closing the connection as far as i understand.
The proxy waiting for the VM would fix it in my case. Would the proxy then block new requests?
Makes sense to still do timeouts. I’m indeed using TCP, so that would be problematic in this case as far as i understand?
It’s possible for our proxy to hand over sockets, but I’ve not implemented that logic. We’d be willing to look into it and make your use-case work. It would also benefit everyone, of course.
Can you configure it to reconnect faster? This is not a permanent solution, but it would help. Losing 500ms vs losing 10s is a big difference. I watch a lot of Twitch streams myself
It is, but you’ll also need to set the kill_timeout as higher as we’ll allow.
The proxy would stop sending new connections to your “shutting down” instance. It would dispatch new connections to difference instances.
It shouldn’t be problematic afaict. There will always be data being read from / written to the connection, so we won’t shut it down! Our idle timeout mechanism will reset its deadline every time data is sent or received (or buffered by the kernel). Since your connections will keep sending or receiving data, then it won’t trigger the idle timeout. This would only be a measure we put in place for other services that might keep a connection open indefinitely.
The broadcast software is under control of my clients. I could tell all of them to configure them with a smaller reconnect window in case of disconnects, but that does not inspire confidence of my clients in my infrastructure
Shared VMs can extend it to 300 seconds (five minutes). Dedicated VMs can extend the timeout to 86400 seconds (twenty-four hours).
Any higher kill_timeouts possible for shared VM’s? Dedicated VM’s are at this point overkill for TCP load balancing of a limited amount of incoming connections. Would prefer a multitude of shared VM’s instead of one dedicated VM.
Any higher kill_timeouts possible for shared VM’s? Dedicated VM’s are at this point overkill for TCP load balancing of a limited amount of incoming connections. Would prefer a multitude of shared VM’s instead of one dedicated VM.
I would also like that. I run game servers that are small enough for shared hardware (each VM should ideally have between 10 and 25 players = websocket connection).
It’s nice to have VMs spin up automatically when and where my players are awake, but I can’t kick out everyone after 5 minutes of a server being underfilled! Something like 1 hour seems more reasonable, but I can’t afford dedicated VMs.