I am trying to find the adequate setup for a Laravel API that has constant/heavy load during daytime hours, and constant/light load after hours. I had been running this on a DO droplet with 8vCPUS, and while I was starting to have some problems, it was doing a decent job. I also had all my jobs running on that same droplet whereas here on fly, I have them on separate processes, and thus on separate machines. My struggle so far has been trying to predict what would be an adequate setup without breaking the bank. By breaking the bank I mean multiple performance-8x.
For reference, I started with 3 shared-cpu-2x with autoscale rules based on requests, and from there I went to 5-2x, 5-4x, and then to 5-performance-1x, back to share-cpu but now 8x with 7 machines.
The worst so far has been the performance-1x which I think was due to how PHP-FPM workers were competing for just 1 CPU, even after adjusting the different pm values. Also, I do not think 1 performance CPU would work for this Laravel API due to the need to process parallel requests.
Now, my current setup has giving me the best performance, but that is with the current 75% quota. Once it goes to 100%, I think this is not going to work, and as mentioned in the original post, shared machines are not really designed for constant load. The next closest setup that I can go with would be 2 performance-2x machines, but based on the numbers I saw from having 5 performance-1x, I do not think that is going to cut it. Anything higher than that machine count wise, would make it harder on the budget side. I realize sometimes that is just part of needing a good performant infrastructure. Perhaps there is something missing somewhere. As a side note, RAM is not an issue, the machines have 2GB RAM but not even using 30%.
Just a thought, and fully understanding that we are not comparing apples to apples, but what would be a fly equivalent to a VPS/droplet hosted somewhere else with 8 vCPUS that had 16GB RAM?
The closest equivalent would be shared-cpu-8x with 16GB RAM. The only real difference seems to be that the Premium Intel option guarantees a newer CPU with faster memory.
Digital Ocean also notes that these basic droplets are shared CPU and only recommended for bursty loads, not constant loads. Is it possible you’re being throttled on DO but not noticing it?
In terms of size recommendations, you also need to factor horizontal scaling. Since you have a higher load during the day than at night time, it may be better to have smaller machines but more of them.
For example, instead of having 3 shared-cpu-8x 16GB machines, have 12 shared-cpu-2x 4GB machines so that during half of the day you use all 12 machines, but at night when you’ve got low traffic you’re only running 2 machines. You get effectively the same amount of compute but with a roughly 7.8% cost saving. Since you mentioned RAM is not the issue, you could even lower it to shared-cpu-2x 2GB to get a 50.8% cost saving.
If you’re worried about being throttled, the other thing you can look at is throttling yourself by having your app keep track of CPU usage, and when it exceeds 6% you can respond with a special error message that tells the fly proxy to try a different server.
If you respond to an HTTP request with status code 503 and set the header Fly-Reject (the value can be anything really, we set it to a generic message), then the fly proxy will try another machine (including starting machines that aren’t currently running).
With the cost saving from lowering the RAM, you could overprovision your autostart/autostop machines with an extra 8 machines that only get used for a total of 4 hours per day each and that would still result in a 40% cost saving.
Running a small number large servers for longer is not necessarily cheaper than running a large number of small servers for parts of the day, so if you can adapt your app for horizontal scaling you’ll be able to achieve a higher degree of flexibility with potential cost savings too.
Thanks for the detailed response. The setup you mentioned is one of the main reasons why I decided to move to fly.io. The flexibility it offers when scaling, cost savings when the machines are not running, plus the ability to use headers to our advantage for replay or in this case to retry the request on a different machine are IMHO great features that can handle different types of load.
Having said that, I had the same expectations performance wise, but somehow I am not seeing that. You might have missed in my original post the part where I currently have 7 shared-cpu-8x 2GB after starting with 3 shared-cpu-2x and then going to 5 shared-cpu-4x. The more I read other posts, and analyze what I have, the more I am convinced it is probably something I have not tweaked in my setup. Not necessarily machine wise, but more in the PHP-FPM settings itself (PHP’s process manager) that can handle the high load with the 8 shared cpus per machine.
I am not giving up, I will look into that and see where it takes me.
It looks like your app goes from serving ~400 requests/second at peak to ~30 requests/second during off hours. Shutting down some of your machines during off hours would be a great way to get more performance for the same cost.
Also, at peak hours, it’s taking anywhere from 500ms to 5s to respond to requests and, as you’ve noted, it’s taking quite a bit of CPU as well. Is that about what you’d expect based on what your app is doing? If not, you might also consider doing some profiling to see if there is anything that could be optimized or if better caching strategies could help.
If your app really need to do up to 5s worth of CPU-intensive work for each request, then scaling your machines is going to be necessary during peak hours.
It might be worth trying out the auto_stop_machines/auto_start_machines settings. You can read about their behavior here. You’ll also probably need to tweak the soft_limit for your service to get the thresholds right for stopping machines during off hours.