I asked this question here on the terraform provider repository but really my question is more general than that. Currently the terraform provider is not being maintained and is in a broken state where volumes don’t work. You can view the end of this thread for context. I’m looking for a new method to maintain infrastructure.
I currently have about 15 machines running that all work together and need to be scaled up and down along with adding more machines but this is unreasonable for me to manage manually.
Is there another recommended way to automate fly infrastructure?
For anyone who’s automating many servers on Fly, how are you doing it?
For us, we use fly autostart/autostop for our dynamic services such as APIs and other HTTP services. This allows us to provision a large number of machines while only having to pay for them while they are running. For non-dynamic services these scale ups are manual but they rarely need scaling.
As for the management, we use Github action pipelines that automatically deploy updates for apps that don’t use volumes. For machines needing volumes this is still a manual process but luckily for our requirements these machines don’t change very often.
I’m building FlyCD (https://flycd.dev) as a means to automate the deployment and releases for people that deploy Fly apps and need automation for preview apps and preview databases. FlyCD doesn’t do GitOps infrastructure as code automation. I don’t know of any tool that do not. I think you’d have to pick a CI/CD system and figure out how to use their Machine API or use the CLI to automate them yourself.
I think the person who manages the Terraform provider now works at Fly, I’m surprised you say it’s unmaintained.
Thank you for the feedback. I’ve seen some really cool Github actions setups so I’ll dive into this a bit but almost all of my services use volumes so this may not be viable. Also, good to know that you’ve had a good experience using cli manually for scaling up and down along with autostart/autostop. I already do use quite a few bash scripts to run multiple cli commands so maybe i’ll just be continuing in that direction.
The services are things like database, search engine, api, node.js, object storage, reverse proxy, and so forth.
I may have worded that paragraph a bit ambiguously. It’s not a case that we can use github actions to deploy updates to machines with volumes. It’s a case that we can’t use autostart/autostop for machines with volumes due to the nature of the apps we run.
Our main app with volumes is our CockroachDB cluster. We currently use manual deploys for updates to ensure that only 1 machine at a time is offline to minimize risk of a loss of quorum, especially since deployment management is done by the fly CLI on the client side and not a server-side service.
I don’t have insight into how your apps function so this may be misguided but I would think that the API, nodejs, and reverse proxy services don’t need volumes and would be prime candidates for autostart/autostop with over-provisioning.