For my use case, I need to provision a web application to each of my clients. It’s the same webapp for every client, but with slightly different settings, different CPU/Mem and, of course, different storage volumes.
I need to provision these apps to my client dynamically: create when they sign up, destroy when they resign. Since these provisioned web apps don’t get accessed all the time, they should be auto-started and auto-stopped by the fly proxy.
My question is therefore: Is there a limit to how many apps I can have? Would it make any trouble if I’d have hundreds or thousands of apps?
Any answer is greatly appreciated, especially by fly.io staff
You might consider going with just one app. While we have a dashboard and, of course, flyctl, you might want to consider dropping down to the API: Working with the Machines API · Fly Docs
Basically if you can code an application that can issue a HTTP POST, you can create and destroy machines at will. The “hardest” problem you will face is the need to base64 encode any unique settings files, something that can be readily done by most languages.
I have tried going with just one app and using many machines inside that app already. Doing the HTTP POST requests is no big deal and I set environment variables instead of base64 encoding the settings, but that would probably work fine, too.
My problem with that approach is the auto-starting and auto-stopping of these machines. I need to route to them using subdomains of my apex domain. I can make an nginx route the subdomain requests to the machines’ private IPv6 addresses and that works just fine, but that way, the fly-proxy does not auto-start or auto-stop the machines. (Is that because they are set to listen to port 8000 instead of 80? If I set them to listen to 80, requesting the apex domain randomly shows nginx or a provisioned machine…)
I also don’t see auto_stop_machines, auto_start_machines, or min_machines_running documented as a part of the API. I’m assuming that is an oversight, and have inquired internally.
You probably either want to go to two apps if you have an apex domain; either that or provision nginx on each machine and have it process routes that are for the current machine and replay the rest.
Looks like using the Fly Replay Proxy is working fine, except returning 409 does not make the Fly Proxy redirect. It does work with 200 and I have set it to 303 “see other” for now which also works. Which http response code would be most appropriate? Does it matter at all?
Auto-stopping the machine created using the machines API (HTTP POST) does not work though. If I understand you correctly, it is because the API does not support the auto-starting and auto-stopping fields in the request. Maybe it is not documented because it is not (yet?) supported?
That’s odd, it works on my app. But it doesn’t matter - use what works for you.
I caught one of the people maintaining the API, and was told that these fields not being included in the documentation is an oversight and that will be corrected this week. My guess is that the structure in the toml closely matches the structure in the API: Fly Launch configuration (fly.toml) · Fly Docs try setting those fields in the services section where you specify your internal port.
Thank you for your answer.
You are right, and I can confirm that it works as you describe. My test machine does start automatically when stopped and hit by a request. Unfortunately, it does not auto-stop.
if the Machine has no traffic (a load of 0), then the proxy stops the Machine
Auto-stopping the machine if it has no traffic is exactly what I need, but it is not the same as a load of 0. Grafana shows a CPU utilization between 0.1 and 0.4 when my app is idle. This is probably because it continuously checks redis for tasks. This functionality needs to be kept in my application, but it is ok for the VM to be stopped and restarted on request. The due tasks would simply execute then and that is fine.
Is there a way to make the fly proxy stop the machines with no traffic instead of a load of exactly zero?
Auto-starting seems a bit buggy: With my test machine stopped, I sent a request to wake it up and then the logs were getting flooded with these messages:
... ams [info] Starting machine
... ams [error] machines API returned an error: "could not reserve resource for machine: insufficient memory available to fulfill request"
It took exactly 5 minutes with many retries and a few rate limit exceeded messages before the machine successfully started. During that time, I can’t stop the machine with neither flyctl machines stop ... nor flyctl machines stop -s SIGKILL ....
Unfortunately, I think you may be hitting up against the (current/prior to deploying $X0M of new hardware) frailties of Fly’s platform. If there isn’t enough available resource (on the host your machine lives on) for Fly to start your machine - it isn’t going to start.
As of today I’m not sure you can reasonably assume that at any given moment in time, at least for specific regions (TBC) and/or hosts, it will be possible to start (automatically or otherwise) a stopped machine*.
Do Fly publish a list of under-resourced regions?, or resource allocation-failure counts per-region?; the absence (TBC) of such a list does prevent you from being able to plan where you should deploy to (or move existing machines).
In theory you could create your own Fly-Machines Orchestrator to overcome these sorts of problems (maybe call it Momad?) but I suspect that isn’t really feasible.
Hopefully there will come a point soon when these sorts of resource allocation issues become a rare event.
*Note: I do not work for Fly and have no visibility into the incidences of failures due to resource allocations. It’s just my reading of the situation based on forum posts (e.g. a list of forum posts related to “reserve resource for machine”)
I might look into creating a custom fly-machines orchestrator because I can’t use fly’s proxy (need to POST more than 1mb) and can’t use fly’s auto-stop mechanism (see above). I’d need to evaluate how feasible it is for my use case. This is kind of a deal breaker though…
While I do work for fly, this is not my area of expertise; but from what I have seen there are plenty of resources in NRT, but some software error prevented (and still prevents?) new VMs from being allocated: fly deploy - failed to launch VM: no capacity available in nrt
I’d suggest reposting this specific question with a subject line that matches for visibility.
I have similar requirements, and have use nginx to proxy requests to the correct machine.