I’ve recently deployed a Go application that sends an HTTP request to the external resource each time a specific endpoint is called. I’ve noticed that after I leave the app running idle for about 10 minutes the next HTTP request always takes about 6s to complete and times out. Every subsequent request completes in < 1s but when I leave the app idle again for 10 min the problem repeats iteself.
I’ve deployed the same app to other cloud providers (namely AWS and Hetzner) and the problem doesn’t exist there so I’m assuming it has to do with the Fly.io network.
I don’t think this is the case here. Auto-stop is disabled by default in the config and the machine appears to be running in the dashboard. Also the error I get is a specific application error which means the request has been accepted by the application and a timeout happens after that.
2024-07-27T15:33:51.625 proxy[178190eb263108] ams [info] App fly-io-timeout has excess capacity, autosuspending machine 178190eb263108. 0 out of 1 machines left running (region=ams, process group=app)
2024-07-27T15:33:52.426 app[178190eb263108] ams [info] Virtual machine has been suspended
and automatically starts after a request is made (in about 3s), and there is no timeout when sending a request to external resource.
However this behavior is completely different then how it was before. With the default values (auto_stop_machines = “off”) there is no log message about stopping the machine, the machine icon remains green but when a request is made the network call still takes about 6s to complete. It looks like an app was running but its network stack/proxy was down.
I don’t want to suspend the machine, I’d like to keep it running all the time but still get that instant response when needed.
Correction - I’ve tried a few times and managed to make the request timeout when the app woke up from suspension. I mean the app was suspended, I sent a request and after 6s got
Interesting, there’s definitely something off, it shouldn’t take 3 seconds after resuming a suspend, it’s pretty quick (100-200ms). Try deploying to another region? Maybe there’s infra issues in ams.