Recommended `services.concurrency` settings for Elixir/Phoenix apps?

Hi there!

We have recently started to investigate Fly.io as potential alternative for some Elixir/Phoenix apps. These are apps which are

  • use a lot of Phoenix Channels and some LiveViews.
  • read-heavy.
  • Most work the app is doing is of the kind where one user making a change arrives in the app through a websocket, and this is then directly forwarded to all other users while (sometimes) on the side being persisted to the DB for auditing/statistics.

I’ve been reading up how autoscaling works (if enabled). By default, the fly.toml contains a hard_limit of 25 and a soft_limit of 20.

Correct me if I’m wrong, but I’m pretty sure that these values are rather low for your average mostly-reads mostly-websockets Elixir app.

What kind of values have you ended up using for your Elixir/Phoenix cluster(s) in practice?

Thank you very much!

~Qqwy / Marten

1 Like

Unfortunately, the answer is: it depends.

A basic Phoenix app that doesn’t hit the database and return a static string could probably handle tens of thousands of concurrent connections. While an app that does a lot of writes or big operations may only be able to handle 5.

You’re not wrong. The limits are very low for most use cases. You should be able to handle at least a few hundreds concurrency connections. You can play with the concurrency settings and see how it affects performance and load balancing.

You can also try type = "requests" under the services.concurrency block in your config. This sets the concurrency limit on the number of concurrent requests being handled (if your service uses the http handler, which it seems to do). This will likely become the default at some point. It allows us to reuse connections and put less stress on your connection accept loop.

2 Likes

Are there any suggested ways to measure this? Here’s how I would generally proceed - if anyone can help in making the procedure more specific or trustworthy, please do.

  1. Deploy app to org A
    a. scale to single instance of target size (to simplify measurements, not a good assumption for clustering-heavy apps)
    b. set services.concurrency.hard_limit to infinity (e.g. 1000000000)
    c. configure metrics with Prometheus on Fly + Grafana
  2. Deploy client to org B (avoid private networking altogether)
    a. create it e.g. using tsung, but how for LV?
  3. Execute the benchmark
    a. start the client
    b. measure Load Average, Memory Usage and CPU Time in Grafana
    c. find the concurrency and bottleneck
  4. Verify the results
    a. set soft_limit + hard_limit close to the diagnosed max
    b. upscale the app x2
    c. see that it works for concurrency x2

What do you think? Is this a valid general approach? Is tsung the easiest way to write load testing scenarios for Phoenix LiveView?

Notes:

If there are multiple app routes of highly varying impact on the system, we could either mix them in a single benchmark or test separately and take the worst concurrency result.

If one would like to optimize, it should work nicely with swift fly deploy iterations. The benchmark could be committed into repo for subsequent reuse.

This might work, but personally I’ve found that real world usage is very different. When I do this I set the concurrency to a low number, turn autoscaling on. This will almost always result in under utilization (the instances are mostly idle even when new ones are being added by the autoscaler). Then I keep turning up the concurrency limit until my instances have their CPU/RAM limits (the worse of the two) pushing 80%. Then I adjust the instance size CPU-RAM ratio to make sure the utilization is about even, so either one can be the proxy for the other. Then I might turn up the concurrency to hit 90% of my combined CPU-RAM level, but this depends on your appetite for risk.

The database is of course the wildcard in all this, and that’s a different topic, but this assumes you’re scaling the DB up to support whatever load you have. There’s another question here How to setup and use PGBouncer with Fly Postgres - #2 by sudhir.j with some discussion on that.

2 Likes