Session affinity ("sticky sessions")?

luke · January 26, 2021, 8:00pm

Fly.io looks awesome and I’m interested in using it to run Meteor-powered web servers for my app. One thing that I need from the load-balancer layer, though, is cookie-based session affinity (“sticky sessions”) because it uses WebSockets. I don’t see how to enable that; is this a hidden feature or just not possible?

I also don’t see how to pull Docker images from a private repo but I have ways around that. Sticky sessions are a deal-maker/breaker though.

Thanks!

kurt · January 26, 2021, 9:15pm

The tldr is, we don’t have built in session affinity. WebSockets on Fly work fine, though, so your meteor app might just work. If session affinity becomes a problem you can either:

Run a single app instance per region
Run nginx between our router and your app (you can even run it in the same VMs as Meteor).

Running one instance per region means your users will almost always connect to the same instance. If that one dies for some reason, we’ll connect them to the next closest.

The nginx option would work well with our private networking. It’d be pretty simple to adapt this project to load balance across Meteor VMs: GitHub - fly-examples/nginx-cluster: A horizontally scalable NGINX caching cluster

kurt · January 26, 2021, 9:15pm

Incidentally, session affinity is on our todo list, we’re just really hesitant to add cookies to peoples’ requests.

luke · January 26, 2021, 10:20pm

Single instance per region could work. Means I could have a half-dozen or so servers in the US (where most of my customers are), plus a few more around the globe, which is plenty. And connections will definitely get routed to only one instance for a given user, even if they’re, say, halfway between San Francisco and Chicago? Or is it less deterministic than that? (Unfortunately, Meteor really does need sticky sessions as it’s a stateful platform.)

Really hoping to not have to muck around with load balancers anymore; I’ve had more problems with them than my app servers itself it seems! Also, I don’t think Meteor works with IPv6 networking so PN6 is probably not an option.

Side note: If you haven’t seen it already, Caddy’s got some interesting approaches to session affinity. They have the usual cookie-based policy, round-robin, etc. but they also have an ip_hash directive that uses a hash of the client IP which probably works in most cases and wouldn’t involve setting a cookie. Although I’d still very much support cookie-based affinity as an opt-in setting.

kurt · January 26, 2021, 10:38pm

IP hash affinity is my favorite and might work pretty well now that more clients are IPv6.

If someone is halfway between San Francisco and Chicago, their packets will consistently get routed to one or the other. They won’t bounce back and forth.

If that sounds workable, give it a shot and let us know how it goes. When we finish up some other big features we may be able to knock out sticky sessions pretty quickly.

luke · January 26, 2021, 10:52pm

OK, I’ve got flyctl set up and a fly.toml inited, but how can I connect to a private container register, in my case, Google Container Registry?

kurt · January 26, 2021, 10:54pm

Right now, you’ll need to docker pull <private-image-tag> and then flyctl deploy -i <private-image-tag>. This will pull it down locally, then push it to us. Let me know if that works for you.

luke · January 26, 2021, 11:38pm

Seems to be working. Only wish I could get faster upload speed… I suppose future pushes will be faster thanks to layering.

I also saw flyctl auth docker… is that something where I can push my built images to your registry from my CI?

kurt · January 26, 2021, 11:43pm

Yes! flyctl auth docker does the Docker login for our registry, so if you have a build process somewhere that pushes to GCR, you could add a push to Fly step. You’d basically:

flyctl auth docker
docker tag <image-ref> registry.fly.io/<fly-app-name>:latest
docker push registry.fly.io/<fly-app-name>:latest

I think that’s the right tag command. I never can remember.

Then: flyctl deploy -i registry.fly.io/<fly-app-name>:latest

luke · January 26, 2021, 11:46pm

Cool. And how does that first flyctl auth docker work if it’s a headless CI? How do I login to flyctl and maintain that login? E.g., is there an auth token I can store in an env variable, or something else?

kurt · January 26, 2021, 11:47pm

Yep, set the FLY_ACCESS_TOKEN environment variable with your access token. You can print that out with flyctl auth token.

https://fly.io/docs/flyctl/integrating/#environment-variables

luke · January 27, 2021, 12:14am

Great, I’ll probably end up doing that. Still trying to get a test deployment working; server appears to start fine according to the logs, but health check fails. internal_port is definitely correct. I tried removing the tcp healthcheck and putting in an http one but it still fails…

kurt · January 27, 2021, 12:17am

That normally happens when a server process is only listening on 127.0.0.1.

It seems like Meteor might want the port arg to listen on all addresses: --port 0.0.0.0:3000 maybe.

luke · January 27, 2021, 12:20am

I removed the health checks and restarted it and I was able to access it via the fly.dev URL. Then I put the health checks back in (both TCP and HTTP) and it works fine. So… but I’ll take it.

luke · January 27, 2021, 12:23am

It seems like the app is having a hard time starting up before it gets killed. Maybe there’s a timeout I can increase? Or more likely I just need to go to a dedicated CPU, although I’d also like to be able to run beta/QA instances on shared CPUs…

kurt · January 27, 2021, 12:25am

Shared CPU should be fine. If it’s getting killed, it’s likely health check failures. I can have a look, what’s the app name?

luke · January 27, 2021, 12:25am

sm-sandbox

kurt · January 27, 2021, 12:35am

Ok, they were restarting because checks were failing. By default, 5 health check failures trigger a restart. You can override this by adding restart_limit = 0 to the check definition.

It appears the checks are passing now but I’m unclear what changed!

kurt · January 27, 2021, 12:37am

Ah, it seems like it might take node main.js quite a while to get going, like close to a minute and a half. 5 check failures is only 50s in this case.

I disabled the restart limit, which made things run properly.

luke · January 27, 2021, 12:42am

Thanks! You’re earning lots of points for your help here!

Would be nice if there were an “ignore health checks for n seconds” option like there is in other platforms, so that it can start up but be dealt with quickly if failing.

Probably enough for one day for me but thanks again and I look forward to digging in again tomorrow.

Topic		Replies	Views
Fly Launch — now with more Meteor ☄️ Fresh Produce launcher	0	272	July 16, 2024
Support for Sticky Session Questions / Help proxy	17	178	July 25, 2024
How to set up session affinity with nginx for socket.io Questions / Help	8	818	April 27, 2024
This is only possible on fly...	13	3656	April 17, 2023
Sticky sessions?	7	464	May 11, 2024

Session affinity ("sticky sessions")?

Related topics