Running into capacity issues

Biswas · September 21, 2023, 5:54pm

I’m in the process of migrating a high traffic service from AWS Lambda to fly which requires I spin up 500-1000 machines of the shared-cpu-4x type with 2/4GB of memory.

AWS Lambda which uses a 1 request = 1 container model and the nature of the task also consumes a lot of resources, so I have no choice but to keep this model and use a high number of machines.

While I’d love to migrate this app off AWS Lambda and into Fly, while testing this setup, I encountered some issues and I’d like to ask a few questions:

When scaling VMs in the iad and atl regions, I could not request sufficient capacity. As an example, I could only request ~230 machines in atl for the shared-cpu-4x/4GB machines in atl.

a. Is there something I need to do to acquire more capacity, such as upgrading to a paid plan?

b. Alternatively, if it is not possible to acquire, say 500 machines in a single region, is it recommended to spread the required capacity over multiple regions? In this case, will fly automatically route the requests to the region where capacity is available?
When trying to deploy new versions, I’ve also observed that the update might fail midway, only being applied to some containers even with a rolling strategy. As an example, I tried a deployment in the atl region with shared-cpu-4x/2GB memory/440 instances, but this failed midway with the update only being applied to ~420 containers.

I’m concerned that I may not be able to deploy updates once I productionize my application, so what can I do to ensure that I can update my applications? Should I try with fewer in each region, as I mentioned in 1.b?

Thanks in advance for your help.

Zane_Milakovic · September 21, 2023, 10:36pm

I have a few questions -

Are these separate apps, like you have 500-1000 unique lambdas on AWS.

Or is this a lambda that in parallel scale to 500-1000 lambdas?

Few thoughts -

Lambda is not designed to be multi threaded, I am curious why this can’t be done with less instances, and have multiple processes or threads on a machine handle different requests? Go instances I have can handle 10k requests, though that is not heavy workloads.

Fly is more of a traditional VM, not even a container. So you have access to more compute than lambda and I would be surprised if you need that many, unless you are trying to avoid modifying code.

Yes, requests will route across region. The soft_limit and as well latency is used to choose what region to go too. If no machine is available, but you still have extra provisioned, it will start machines up to the hard_limit. Check out concurrency settings -

Where are the requests coming from. Fly puts the VMs at the edge of the cloud, so the reason to use multiple regions is to be closer to end users, but also for high availability.

If you don’t mind sharing, what are you doing that requires that type of compute, and that many instances?

Biswas · September 22, 2023, 2:10am

It is a single Lambda function that scales up to 500-1000 parallel invocations.

My use case involves running headless browsers to generate reports, and some image manipulation. While most web pages don’t take anywhere this amount of memory, some webpages do, which is why I have to provision a single VM with enough headroom and the 1 request = 1 container model is helpful in terms of performance.

I’m afraid routing many requests onto one container would just lead to resources being exhausted even faster.

Zane_Milakovic · September 23, 2023, 2:49am

Interesting issue. That is a tough problem. How long do the lambdas take now to process a heavy request? Is it headless chrome or something more efficient?

system · September 30, 2023, 2:49am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Using fly.io as an alternative to AWS Lambda Questions / Help	6	1265	June 21, 2023
Billed high amounts for auto start/stop machines Questions / Help	4	341	September 29, 2023
Capacity Fixes & Improvements Fresh Produce	6	492	December 16, 2023
Why is my app running on 4 machines? Questions / Help	3	255	September 26, 2023
how are app vms (re)used	7	632	March 30, 2022

Running into capacity issues

Related Topics