I’m getting ready to beta-test a service that relies on many long-lived websockets connections. Long-lived might be hours (or even days) long. It’s okay if they disconnect – they’ll reconnect. I’d just like to avoid the expensive re-connection every minute.
During testing, we’ll likely have less than a hundred at a time, but with our user base, it could get up to 10k-100k simultaneous connections.
I want to be a good citizen and not blow things up, so:
How many simultaneous connections can a single Firecracker VM support?
Am I okay adding a sub-minute heartbeat to keep the websocket from closing? Or is there a way to configure fly to not close connections after a minute? I’d rather not have a heartbeat, as it seems like a waste of data and I’d rather not make mobile devices or fly constantly send/receive heartbeats.
Reading Metrics on Fly.io · Fly Docs and then looking at the File Descriptors metric chart on Grafana leads me to believe I can use up to about 20K connections (well, file descriptors, which may not be exactly 1:1).
The app isn’t yet built to be able to scale horizontally (single SQLite database). Our current plan is exactly to “shove too many connections on a single VM” and see how far it can go with just one I’m trying to delay horizontal scaling until LiteFS - Distributed SQLite · Fly Docs is production-ready, because that will simplify the design.
I think what we really need to know from Fly is how many simultaneous connections the proxy can handle and will allow, for the organization as a whole, for the application, and per machine. It would be good to explicitly document such limits. That’s one thing I appreciate about AWS documentation. Thanks.