scale to zero possible?

Any ETAs?

What happens today though? If my scale min is set to 1, is it the case that 1 vm's always running (or worse: even if it isn’t, I am billed as if it was)?

On the “what happens today”, yes the minimum is 1 and so 1 VM is always running. And so you are billed for it.

If you don’t want to be billed, you have to set the number as 0. But it won’t auto-wake itself. You’ll need to manually put it back to 1 to serve requests again. So that would be the case e.g for a dev/staging type of app that doesn’t need to be running unless you need it.

Auto-scaling from 0 (ie waking up when a request comes in) would still be neat. The ETA question on that is one for Fly.

1 Like


Auto-scaling from 0 (ie waking up when a request comes in) would still be neat. The ETA question on that is one for Fly.

Fly used to be a lot like lambda but now keep min number of instances around for whatever reason.

I am firmly in the camp that server under-utilization is a hard problem for the end customer and I’d rather someone else solve that for me.

Let’s see if Fly comes back into the serverless world but with WASM instead which looks more and more promising (Fly did start off as a javascript-at-the-edge platform, so there’s some hope?).


I see two options in order to get knative like scale to zero and back up automagically.

  1. Do it yourself with some sort of Queue and metrics system, that then calls the Fly GraphQL API.

The guts of the design involves running a gateway that listens for routes, checks if the fly app that the route maps to is up. If the FLy app is not running, it starts it, and then lets the request through. Because Fly is so fast you might not even need a Queue Broker to hold the request.

Then for the metrics, you need that same Gateway to listen for Fly Metrics, and scale the instances of the Fly app up or down based on the metrics. Of course you also need to drain a Fly App instance before killing it.

  1. Fly Team integrate it.

Just wanted to pop into the thread to say that is the container host of my dreams, way better than the Google Cloud and AWS container services I’ve used.

Scale to zero is probably the only feature I use that’s available elsewhere (Google Cloud Run) and not yet on, so when it’s in I’d love to migrate over a bunch of containers.


Just discovered Fly. Simply magnificent. Congratulations!

I can’t wait for this feature (which should perhaps have a higher priority, also to save A LOT of resources on your servers, team).

Thanks again!

@frederikhors You’re in luck, Fly launched “serverless” (scale-to-zero-at-will; 300ms-wake-up-on-demand) just today: We're launching Fly machines today

1 Like

If and when you start charging for IPs, would this only be for apps exposed to the outside world?

Say in a project, you have many apps (e.g. 50), some of which may only run for several seconds, several times a month and instead of exposing them to the internet and having the overhead of $1 each, you’d only expose one app which would serve the role of a gateway/orchestrator.

1 Like

Is there an update on this one?

Sounds like this might be a good use for flycast: Private Networking · Fly Docs

We do have a way to achieve “scale to zero” now: Scale the Number of Machines · Fly Docs

And it also works with Postgres! Scale to zero Postgres for hobby projects

The “time-to-scale-down” is up to the app (scale-to-zero works by having your app decide to exit after some period of inactivity).

(PS: It’s not multi-region though.)


@guillaume Will there be a way to scale to zero without having an app shut down itself if there are no requests from outside?

I have an image proxy app which is a ready to use dockerfile. It’s used a few times a day, would be great to have it stopped during inactivity.

@Elder the issue generally speaking, is you can have anything running inside your container, and from our side, we don’t care what it’s doing. So we wouldn’t want to kill it while it runs and is potentially doing useful stuff.

That why only you can kill it: either by using our API or CLI, or by making it exit from the inside. And this you can do from your app code, or by adding another proxy in your image like this one (kudos to @ties-v for forking a demo we did).


So when using machines in principle to get scale to zero I just need to end the process from inside the app and Fly will restart it when a new request comes up?

In Node would that be simply doing process.exit()?

Also I guess I’d need to change the restart policy from always to something else.

Is there any guide about this?

would be great if it could work like Lambda out of the box:
request > wake > response > stop

@pier you got it exactly! In Node our docs actually link to this repo for an example in Node/TS/Remix where this file indeed calls process.exit(0).

And yes the restart policy should be on-fail (the default for new V2 apps) as explained here:

1 Like

@Elder you’ve actually found the culprit here:

response > stop

→ To determine that the response has been handled and the process can indeed be stopped, AWS has to introduce some Lambda-specific API for this. (In Node/Python/… it’s their “handler” concept, and for the generic “any-language/docker-image” case, it’s their runtime API).

We keep it simple and leverage primitives that everyone knows, like processes that exit with a code. We don’t introduce such a (complex - some would argue) API / abstraction for you to learn.

1 Like

I just run my own proxy so I know when a machine has no activity.

Or you can just embed an agent in your machine which tells your provisioned what’s happening.

You can also use the metrics but it’s less precise.

Then you need to run a provisioned that automates taking your machines up, down or scaling to zero.

The agents feed basic metrics into nats. The provisioned feeds off nats.

Nats does all that quite well and it’s what use internally as we all know.

Is this the golang example ?

os.Exit in this case

Fly released Automatically starting/stopping Apps v2 instances last month, so it’s now possible to scale to zero without needing the container’s entrypoint to exit.

They also released Setting a minimum number of instances to keep running when using auto start/stop a few days ago if you want to automatically scale down but to a number larger than 0.

1 Like