Machine throttle monitoring

Good evening everyone, hope you’re all doing well.

I’m running an application on two shared-2x machines, and they handle about 95% of the traffic just fine. To be honest, most of the time they’re overkill, but we keep them that way “just in case”.

Context:

At certain times of the day we get a burst of webhooks from some clients (expected, but without a predictable schedule). This usually happens late at night when no one on the team is watching, and the application’s performance degrades significantly.

Our first idea was to scale up the machines only during the night, but we don’t want to waste resources.

Looking at Grafana, I’ve noticed that the machine starts getting CPU throttled, and from there the performance becomes heavily degraded. We have an alert for this, and when it fires I wake up, check manually whether throttling is the issue, and scale the machine if needed.

What I’ve tried so far:

When the alert fires, I check whether any machine is stopped and send a signal to start all of them:

machine_ids=$(flyctl machines list --config ./fly.prod.toml --json | jq -r ‘..id’)
for id in $machine_ids; do
flyctl machines start “$id” --config ./fly.prod.toml
done

But as mentioned, the issue isn’t stopped machines… it’s the CPU throttling.**

What I’d like to do:

I want to know if there’s a way to detect throttled machines using the flyctl machines list command. When an alert fires, the script could automatically check whether any machine is being throttled and trigger a scale-up to shared-4x if necessary.

Is there a way to do this filter?

Forgot to mention:

All our traffic goes through another Nginx application acting as a reverse proxy via Flycast. So I could create a separate application dedicated to processing the webhook endpoints and reduce the chance of degrading performance for other users. However, this also feels like an overkill solution for now that it is a pontual burst for a couple of hours, and it could introduce new issues (forgetting to deploy a feature to all environments, making rollbacks harder, etc.).

Thank you for your help.

Hi!

This is likely happening because you haven’t set a hard_limit. This tells the Fly proxy how many connections to send to each of your machines. If you didn’t set this, the proxy will assume each machine can handle as many connections as it can throw at it…

If you set a suitable soft and hard limit, and add enough machines to handle your peak load, load balancing will take care of:

  • start stopped machines once all of the current running ones have reached their soft-limit (scaling up to handle bursts in demand)
  • stop started machines once all machines are under the soft limit (scaling down to conserve $)
  • Prevent machines from getting more than they can handle (no more than hard_limit connections). This will result in request queueing and possible 503 errors for clients, but prevents your machines themselves from getting bogged down, so existing requests will be dispatched timely.

We have a guide here on how to experimentally determine the hard limit via load testing. Since you have periods of low traffic, and assuming you can tolerate some downtime due to overwhelmed machines during those times, you can simply load-test your existing setup during low-traffic periods using one of the tools described here and identify the point at which requests start bogging down or the machines start throttling, and then back down a bit from that value for the hard limit.

Note that throttling machines have a burst balance, so you’ll want to keep each load test going for about 5 minutes to ensure you exhaust that balance and test actual baseline behavior.

You’re basically implementing the Fly proxy’s scaling behavior here :slight_smile: “start a new machine when existing ones are over their soft limit”.

Not via flyctl - you could do this based on metrics (essentially automating what you said you were doing manually) - but by far the best option is: don’t let your machines throttle significantly, and you can do this by ensuring they don’t get enough requests that cause them to throttle. In turn, there are two main ways to achieve this:

  • Set a proper hard_limit at which the machine is not throttling.
  • Use bigger machines.

But bigger machiines still have a practical limit to how many requests they can handle - hopefully it’s clear here that the best way to do it is to inform the Fly proxy about that practical hard_limit so it can distribute load to your machines in a way that doesn’t cause them to die.

Fairly certain nginx can handle the traffic just fine - if you’re using Flycast then you’re golden, because Flycast does go through the Fly proxy so everything I wrote above about setting the limits and auto-scaling up/down based on demand still applies.

(I’m curious as to why you need nginx, but feel free to ignore my curiosity if this is an operational or design requirement for your app :slight_smile: )

Let me know if this helps!

3 Likes

Hey @roadmr, thank you for replying!
I was OOO yesterday, sorry for the late reply.

This is likely happening because you haven’t set a hard_limit […]

In fact I’m no longer using the hard_limit, only the soft_limit with the value of 2, when I was using a higher value I could see the traffic was only being routed to a single machine. My current setup is the following:

[http_service.concurrency]
type = “requests”
soft_limit = 2

Maybe "connections“ would be more effective, considering that I’m using a nginx reverse proxy?

This will result in request queueing and possible 503 errors for clients

hahah yes! I got a few 503 when I was using hard_limit, that’s why I removed the hard_limit flag.
Also, I’m only using 2 machines due to the fact of the soft_limit is low and all the time the others machines powers on.

We have a guide here on how to experimentally determine the hard limit via load testing.

Thank you for the tip! I will experiment it tomorrow (at weekends we have close to 0 users and I can mess around).

[…] But bigger machines still have a practical limit to how many requests they can handle

Yes, when we got throttled I just scale to the shared-4x and it handles the traffic just fine, but, they are overkill 95% of the day.

  • It would be great a fly.toml setting to scale the machine itself when the hard_limit is reached :eyes:

I’m curious as to why you need nginx, but feel free to ignore my curiosity if this is an operational or design requirement for your app

Hahaha, the nginx is mostly for caching purposes and rate limiting endpoints to prevent brute-force on login and forgot password features.

  • I know that it is not 100%, because the XPTO.fly.dev is wide open, but our CNAME is pointed to the nginx machine.

Thanks for the reply. You helped clarify a few points, and with that understanding I might be able to distribute the load better. I’ve already created a tech-debt card on our board and will experiment with this over the next few days. Thanks again!

Hi! Thanks for these details!

This is correct. It’s entirely fair for the Fly proxy to route traffic to a single machine as long as it’s under its hard limit. Soft_limit is used to best-effort load-balance but it’s not 100% guaranteed. On the other hand, if a machine hits the hard limit, the proxy will route traffic to other machines that are still under their hard limit. This protects your machines from request floods!

Heheh! let’s put this in electrical terms. “I was getting fuse trips when I had a fuse and wanted to run my space heaters, water heater, electric stove and dryer at the same time - so I removed the fuse”. Guess what happens next :slight_smile:

The hard limit protects your machines from getting overwhelmed individually - which is what you’d been seeing happening. To continue the analogy, don’t remove the fuses for each individual circuit: keep the fuses (hard_limit that applies per machine) and upgrade the power line’s global capacity (add more machines so the aggregate can handle all the load). (I agree it’s not a perfect analogy but hopefully the image of a house burning down is graphical enough :smiley: )

There is - it’s a combination of a suitably-tuned soft limit and having enough machines to handle your expected peak load (and perhaps some headroom).

Creating machines automatically is scary - you see it all the time in the news “AWS Lambda spun up a million machines”. On Fly, instead you manually create them with fly scale count, but your soft and hard limits govern how many machines are actually running based on current demand. Since you created the machines manually, you know what the cost ceiling is, and since you only pay for running machines, your cost will generally be lower and determined by the number of machines needed to serve your average request load, with small peaks when you get request bursts.

Load balancing describes how, when all currently-running machines are at their soft limit, more will be started to serve requests.

So my very strong advice is: set a hard limit, but properly tune it, using the guide I sent - don’t just set a value out of thin air. A too-high hard_limit is as ineffective as an unset hard_limit as you’ll get pileups that will bog down your machines; a too-low hard_limit won’t have performance implications but it means your machines will be underutilized (it can serve 20 concurrent requests without slowing down but your hard_limit is 10? then you’re paying for capacity you’re not using - set the hard limit to 20!).

  • Daniel

@roadmr

Daniel, I think I’ve found the issue:

I asked our client to send the webhooks directly to xpto.fly.dev to bypass our Nginx and see whether it might be one of the bottlenecks, and also to rule out other possible points of failure.

However, after checking the logs and Grafana metrics, I believe in this case the solution really is to use a more powerful machine. Even though there are many webhooks (around 60k requests) they’re sent one at a time, so concurrency stays low. The load balancer ends up routing the traffic (most of the time) to the same machine, which eventually leads to throttling instead of distributing the load across all active machines.
(In short: even though the soft_limit is 2, since it’s only 1 request at a time, it doesn’t hit the limit. especially at night when there are no real users, only partner automations.)

So I’m guessing the hard_limit wouldn’t help in this scenario either :confused:
I’ll try to mitigate this and force better rebalancing via Nginx, but that’s a problem for Future Luan hahaha.

Thanks for your help!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.