Yup, probably. Just found out about the replay headers. I think I need to handle the load balancing on my own, the normal load balancer was not meant for this kind of thing.
I need to look up the machine ids before-hand from the API, pick the machines which state is stopped since they’re idle, if none available then pick the oldest machine. And force the machine with fly-force-instance-id: <machine-id>, I think. And let the queue retry the same machine always.
It’s also possible/recommended to offload long-running requests to a worker. Here’s an example with Django and Celery but there are worker/queue components for every framework (celery for python, sidekiq for ruby, oban for elixir/phoenix, bullmq for node, laravel queues… etc). That way your web server doesn’t get held up processing the slow job; it sends it to the worker and will then be immediately available to serve requests again.
Yeah I got it separated, It’s a different app from the application server. It’s running node and I’m using tinypool to chug two chromes in worker threads.