Autoscaling worker VMs with a custom external metric

I’d like to run Buildkite CI runners in Fly, and have the cluster scale up and down depending on CI job queue size.

For now, I had planned to run a simple worker that queries Buildkite’s API and uses either flyctl or the GraphQL API to scale based on job count.

Would love to know if there’s a better way, or if this is an interesting topic for a feature request!

Joshua

That’s the best way right now. The only thing to look out for is scaling down – it stops the most recent VMs first regardless of what’s busy. You’ll want to make sure you’re shutting down gracefully and letting the worker finish a build if possible.

OK - that would be best achieved using the customizable stop signal?