Fly.io looks awesome and I’m interested in using it to run Meteor-powered web servers for my app. One thing that I need from the load-balancer layer, though, is cookie-based session affinity (“sticky sessions”) because it uses WebSockets. I don’t see how to enable that; is this a hidden feature or just not possible?
I also don’t see how to pull Docker images from a private repo but I have ways around that. Sticky sessions are a deal-maker/breaker though.
The tldr is, we don’t have built in session affinity. WebSockets on Fly work fine, though, so your meteor app might just work. If session affinity becomes a problem you can either:
Run a single app instance per region
Run nginx between our router and your app (you can even run it in the same VMs as Meteor).
Running one instance per region means your users will almost always connect to the same instance. If that one dies for some reason, we’ll connect them to the next closest.
Single instance per region could work. Means I could have a half-dozen or so servers in the US (where most of my customers are), plus a few more around the globe, which is plenty. And connections will definitely get routed to only one instance for a given user, even if they’re, say, halfway between San Francisco and Chicago? Or is it less deterministic than that? (Unfortunately, Meteor really does need sticky sessions as it’s a stateful platform.)
Really hoping to not have to muck around with load balancers anymore; I’ve had more problems with them than my app servers itself it seems! Also, I don’t think Meteor works with IPv6 networking so PN6 is probably not an option.
Side note: If you haven’t seen it already, Caddy’s got some interesting approaches to session affinity. They have the usual cookie-based policy, round-robin, etc. but they also have an ip_hash directive that uses a hash of the client IP which probably works in most cases and wouldn’t involve setting a cookie. Although I’d still very much support cookie-based affinity as an opt-in setting.
IP hash affinity is my favorite and might work pretty well now that more clients are IPv6.
If someone is halfway between San Francisco and Chicago, their packets will consistently get routed to one or the other. They won’t bounce back and forth.
If that sounds workable, give it a shot and let us know how it goes. When we finish up some other big features we may be able to knock out sticky sessions pretty quickly.
Right now, you’ll need to docker pull <private-image-tag> and then flyctl deploy -i <private-image-tag>. This will pull it down locally, then push it to us. Let me know if that works for you.
Yes! flyctl auth docker does the Docker login for our registry, so if you have a build process somewhere that pushes to GCR, you could add a push to Fly step. You’d basically:
flyctl auth docker
docker tag <image-ref> registry.fly.io/<fly-app-name>:latest
docker push registry.fly.io/<fly-app-name>:latest
I think that’s the right tag command. I never can remember.
Cool. And how does that first flyctl auth docker work if it’s a headless CI? How do I login to flyctl and maintain that login? E.g., is there an auth token I can store in an env variable, or something else?
Great, I’ll probably end up doing that. Still trying to get a test deployment working; server appears to start fine according to the logs, but health check fails. internal_port is definitely correct. I tried removing the tcp healthcheck and putting in an http one but it still fails…
I removed the health checks and restarted it and I was able to access it via the fly.dev URL. Then I put the health checks back in (both TCP and HTTP) and it works fine. So… but I’ll take it.
It seems like the app is having a hard time starting up before it gets killed. Maybe there’s a timeout I can increase? Or more likely I just need to go to a dedicated CPU, although I’d also like to be able to run beta/QA instances on shared CPUs…
Ok, they were restarting because checks were failing. By default, 5 health check failures trigger a restart. You can override this by adding restart_limit = 0 to the check definition.
It appears the checks are passing now but I’m unclear what changed!
Thanks! You’re earning lots of points for your help here!
Would be nice if there were an “ignore health checks for n seconds” option like there is in other platforms, so that it can start up but be dealt with quickly if failing.
Probably enough for one day for me but thanks again and I look forward to digging in again tomorrow.