Session affinity ("sticky sessions")?

Fly.io looks awesome and I’m interested in using it to run Meteor-powered web servers for my app. One thing that I need from the load-balancer layer, though, is cookie-based session affinity (“sticky sessions”) because it uses WebSockets. I don’t see how to enable that; is this a hidden feature or just not possible?

I also don’t see how to pull Docker images from a private repo but I have ways around that. Sticky sessions are a deal-maker/breaker though.

Thanks!

The tldr is, we don’t have built in session affinity. WebSockets on Fly work fine, though, so your meteor app might just work. If session affinity becomes a problem you can either:

  1. Run a single app instance per region
  2. Run nginx between our router and your app (you can even run it in the same VMs as Meteor).

Running one instance per region means your users will almost always connect to the same instance. If that one dies for some reason, we’ll connect them to the next closest.

The nginx option would work well with our private networking. It’d be pretty simple to adapt this project to load balance across Meteor VMs: GitHub - fly-examples/nginx-cluster: A horizontally scalable NGINX caching cluster

Incidentally, session affinity is on our todo list, we’re just really hesitant to add cookies to peoples’ requests. :smiley:

Single instance per region could work. Means I could have a half-dozen or so servers in the US (where most of my customers are), plus a few more around the globe, which is plenty. And connections will definitely get routed to only one instance for a given user, even if they’re, say, halfway between San Francisco and Chicago? Or is it less deterministic than that? (Unfortunately, Meteor really does need sticky sessions as it’s a stateful platform.)

Really hoping to not have to muck around with load balancers anymore; I’ve had more problems with them than my app servers itself it seems! Also, I don’t think Meteor works with IPv6 networking so PN6 is probably not an option.

Side note: If you haven’t seen it already, Caddy’s got some interesting approaches to session affinity. They have the usual cookie-based policy, round-robin, etc. but they also have an ip_hash directive that uses a hash of the client IP which probably works in most cases and wouldn’t involve setting a cookie. Although I’d still very much support cookie-based affinity as an opt-in setting.

IP hash affinity is my favorite and might work pretty well now that more clients are IPv6.

If someone is halfway between San Francisco and Chicago, their packets will consistently get routed to one or the other. They won’t bounce back and forth.

If that sounds workable, give it a shot and let us know how it goes. When we finish up some other big features we may be able to knock out sticky sessions pretty quickly.

OK, I’ve got flyctl set up and a fly.toml inited, but how can I connect to a private container register, in my case, Google Container Registry?

Right now, you’ll need to docker pull <private-image-tag> and then flyctl deploy -i <private-image-tag>. This will pull it down locally, then push it to us. Let me know if that works for you.

Seems to be working. Only wish I could get faster upload speed… I suppose future pushes will be faster thanks to layering.

I also saw flyctl auth docker… is that something where I can push my built images to your registry from my CI?

Yes! flyctl auth docker does the Docker login for our registry, so if you have a build process somewhere that pushes to GCR, you could add a push to Fly step. You’d basically:

  • flyctl auth docker
  • docker tag <image-ref> registry.fly.io/<fly-app-name>:latest
  • docker push registry.fly.io/<fly-app-name>:latest

I think that’s the right tag command. I never can remember.

Then: flyctl deploy -i registry.fly.io/<fly-app-name>:latest

Cool. And how does that first flyctl auth docker work if it’s a headless CI? How do I login to flyctl and maintain that login? E.g., is there an auth token I can store in an env variable, or something else?

Yep, set the FLY_ACCESS_TOKEN environment variable with your access token. You can print that out with flyctl auth token.

https://fly.io/docs/flyctl/integrating/#environment-variables

Great, I’ll probably end up doing that. Still trying to get a test deployment working; server appears to start fine according to the logs, but health check fails. internal_port is definitely correct. I tried removing the tcp healthcheck and putting in an http one but it still fails…

That normally happens when a server process is only listening on 127.0.0.1.

It seems like Meteor might want the port arg to listen on all addresses: --port 0.0.0.0:3000 maybe.

I removed the health checks and restarted it and I was able to access it via the fly.dev URL. Then I put the health checks back in (both TCP and HTTP) and it works fine. So… :man_shrugging: but I’ll take it.

It seems like the app is having a hard time starting up before it gets killed. Maybe there’s a timeout I can increase? Or more likely I just need to go to a dedicated CPU, although I’d also like to be able to run beta/QA instances on shared CPUs…

Shared CPU should be fine. If it’s getting killed, it’s likely health check failures. I can have a look, what’s the app name?

sm-sandbox

Ok, they were restarting because checks were failing. By default, 5 health check failures trigger a restart. You can override this by adding restart_limit = 0 to the check definition.

It appears the checks are passing now but I’m unclear what changed!

Ah, it seems like it might take node main.js quite a while to get going, like close to a minute and a half. 5 check failures is only 50s in this case.

I disabled the restart limit, which made things run properly.

Thanks! You’re earning lots of points for your help here!

Would be nice if there were an “ignore health checks for n seconds” option like there is in other platforms, so that it can start up but be dealt with quickly if failing.

Probably enough for one day for me but thanks again and I look forward to digging in again tomorrow.

1 Like