Autoscaling: is there a way to see how many instances are running in each region?

Hey, I’m currently running 10 containers using fly autoscale balanced min=7 max=10, with 7 regions set. I’d like to be able to see which regions are running more instances, so I could potentially tweak my region list to better serve the traffic I’m seeing. Is this possible in the tools as they are now?

I’d also love to be able to see which instances are receiving the most traffic, but I’m sure I can achieve this once I’ve sorted out custom metrics.

flyctl status lists VMs and regions. If you want to script it you can pass the --json flag.

You can use our metrics right now if you setup a dashboard. Here’s a blog post with more info Hooking Up Fly Metrics · Fly

I was about to contact you.

I randomly noticed your app was hitting some TLS concurrency limits we set on our servers. I increased that limit so they would stop queueing or being rejected right out (we have a max queue limit).

Once I let more TLS handshakes through, I think your app couldn’t handle the number of requests coming into it. Response times rose sharply.

Looks like you got a handle on what’s happening though! You should be good with @michael’s answer.

Right now it appears most of your traffic is from Europe (FRA, CDG and AMS mostly).

One thing you could do is use “requests” type concurrency. This lets our proxy reuse connections to your app and might work better with your kind of app.

Example:

[services.concurrency]
hard_limit = 25
soft_limit = 20
type = "requests"

Thanks for looking into it, and for the suggestion. I’m not sure changing to “requests” would help much; the traffic I get tends to be very short-lived. It’s mostly single requests to an API endpoint, rather than repeat traffic asking for lots of assets during the same connection. Unless you see different behaviour from your stats? I’m happy to give it a try.

It’s also interesting that you say traffic is mostly European; from the lots it looks like it is mostly hitting “ord” and “sin” instances.

We can’t quite tell how much connection reuse there is precisely. Looking into it a bit more, it seems like for ~50K connections we get from users, we create ~70K connections to your app. So it looks like there is some reuse.

In any case, we’d be reusing the connections ourselves to your app. Some apps have a hard time accepting as many connections as requests and do better when we reuse connections (not for a long time) for more than 1 request.

We’re hitting your FRA, EWR and SIN instances the most from my charts. It’s possible European traffic is spilling to SIN if we’re hitting your soft or hard limit. There’s also some element of randomness involved so we don’t keep hitting the same instances. I’m mostly speculating here, I’d have to look more closely into it to figure out where traffic is going.

Ok, I’ve set type = "requests", changed to fly autoscale min=7 max=8, and set the regions to:

Region Pool:
ams
cdg
ewr
fra
sin
sjc
Backup Region:
dfw
hkg

Hopefully this will perform a little better.

BTW, what’s the max ulimit I can set within my containers? I understand this is configured by the docker host, and I can see from my logs that if I set the number of cowboy acceptors too high I get errors: Ranch acceptor reducing accept rate: out of file descriptors.

Most of the requests are currently hitting EWR and AMS as far as I can tell.

I don’t think the type = "requests" helped in your case. Seems like it’s a not connection issue. The instances appear to get overwhelmed.

I think your soft and hard limit are maybe problematic. They’re so high right now that our proxy is likely never going to balance between your instances. If your soft limit is never or rarely hit, the proxy just picks the closest instance. So it seems to swarm 1 instance until it reaches that point and then sends it all to another. Just a theory, maybe something else is at play here. Balancing might also work better if you used less regions (we’ll randomly balance between the closest instances, but if you only have 1 in a region, it might get picked a lot more).

Since your requests are very short-lived, it’s a bit hard for our “loads” state to replicate in time. We should probably adjust how we do that to better fit your kind of app.

We’re not running Docker, every app runs in a firecracker microvm. Meaning you have full control over the limits inside your VM.

We currently set the rlimit (what ulimit sets) to 10240 at boot, within the VM. You can change that value with an ENTRYPOINT in your Docker image. The current value was just better than the much lower default value.

I don’t think the type = "requests" helped in your case. Seems like it’s a not connection issue.

Ok, I’ve reverted this to connections.

The instances appear to get overwhelmed.

Tell me about it… :wink:

Balancing might also work better if you used less regions (we’ll randomly balance between the closest instances, but if you only have 1 in a region, it might get picked a lot more).

Is there a way I can set things up so the locations of the instances can be selected automatically based on traffic? For example, I want a max of 10 instances, min of 2, and for them to be placed wherever is best for the traffic at the time. Maybe setting the list of regions to be empty would achieve this?

We currently set the rlimit (what ulimit sets) to 10240 at boot, within the VM. You can change that value with an ENTRYPOINT in your Docker image. The current value was just better than the much lower default value.

Do you have, or know of, an example of this?

I’m not sure what base image you’re using, but something like:

# ...
COPY ./entrypoint.sh /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
# ...

entrypoint.sh:

#!/bin/sh

ulimit -n 65536

exec $@

Changes the max open files from 10240 to 65536. Assuming this is the limit you’re hitting.

(don’t forget to chmod +x entrypoint.sh)

Can you also try more reasonable limits for your app? It looks like it can’t handle 10K concurrent connections / requests.

Possibly something like soft: 4000, hard: 6000?

Hey again! Your app is struggling, particularly when it scales down to 5 VMs. Will you try a lower concurrency setting? It seems to OOM quite frequently with many connecitons.

Hey, yep. I’m deploying now.

Ok, those chances have been deployed: I’ve set an autoscale of balanced min=10 max=20, and reduced the soft/hard limits.

Your app is struggling, particularly when it scales down to 5 VMs.

What is the period between connections passing the soft limit and new instances being spun up? I can’t see that info in the docs. This info should help me better choose the soft/hard limits.

It’s metrics based, so it’ll take between 15-30 seconds to scale up. The app seems to take 10-30s to start passing health checks after that.

Interesting. So my soft limits are set to 2/3 of the hard limits, and there’s plenty of room to scale, so I would expect to hit my max VMs before seeing the VMs struggle.

Have you benchmarked this on 512MB of RAM and a single CPU? I don’t think it can handle 6k concurrent connections. If you run fly status you’ll see that many VMs are failing health checks at any given time. This is sometimes causing the VMs to restart, but anytime you see a health check failure it means we’re not routing requests to that VM.

The cycle actually seems to be: burst of traffic, VMs go unhealthy, connections retry, scales very high, VMs go health again, connections drain, scales back down. This does sort of make it seem like these can’t handle 6k connections. If you set it to, say, 2k connections I think you’ll see a lot better results.

Ok, I’ll try reducing the limits further.

My workload is mostly CPU-bound; it’s all very short API requests. Initially it was running with 1GB memory and wasn’t making use of it, so having more instances with smaller mem seems to have helped cope with the load much better.

This setup has been running well for the last couple of weeks, and from the metrics I can see that very few 500s are being served.

Yesterday I noticed that the VMs had scaled down to 7 and weren’t scaling up when load increased, and a few hours ago I had to restart all the instances and set a minimum of 10 to get the app back online.

I’m deploying with a lower hard limit and 50% soft limit now, to see how that copes.

Not sure if you can see a longer history than me, but that cycle did happen earlier today when I had to restart the app, and it is common during my deploys as new VMs are coming up and traffic is switching. Maybe I don’t have my tcp/http checks configured properly.

If you can see that cycle happening, say, 7 days ago, then a) I definitely need to tweak my soft/hard limits and b) it would be good to get some visibility on that somewhere (maybe configurable alerts?)