So how my setup works is I have an app with multiple machines created beforehand that can handle peak demand. Asleep machines are very cheap (can’t find the pricing now) so it’s ok to just manually create as many as you think you will need.
In the future I will create machines on demand but I haven’t had the time to play with the machines API yet. And this setup is working great so far.
I explained a scale to zero approach in this post:
I also have another app, let’s call it the job worker, which is permanently awake and orchestrates the whole thing. It responds to a trigger in the Jobs table in Postgres with listen/notify. It could some pub/sub service too.
So the worker sends the HTTP requests to the encoding app to trigger the audio encoding. Basically, Fly’s routing layer wakes the machines as needed when new requests are made to the encoding app. With the auto scaling settings (see the post) Fly will try to use a single request per machine as long as there are available machines to wake up.
One thing to take into consideration is that requests will timeout after 60 seconds if no bytes go through Fly routing layer. So either complete the job before 60s, or stream some bytes so that Fly doesn’t timeout, or just respond the request asap and then do the CPU job.
I don’t really track it, but if I needed to do it I could just do something like this:
select * from "AudioEncoding" where "state" = "ENCODING"
Like I said, all the encoding machines shut down eventually. The job worker who orchestrates all this is always awake.
Like I said, to solve this you could stream some bytes before ending the response. Streaming is a bit of a headache though because you can’t change the HTTP status and there are other considerations.
If you don’t have control over the app making the request, you could ask them to give you a webhook URL you could call when the long running task is complete.