Requests to a suspended machine are taking a long time

When my machine is put into suspend mode due to inactivity, the first request that triggers the wake up takes a long time to be returned (or never returns) to the user

The logs actually show the machine waking up correctly:

2025-07-25 10:15:50.980 machine became reachable in 6.717712ms
2025-07-25 10:15:50.974 machine started in 223.935944ms
2025-07-25 10:15:50.972 Machine started in 218ms
2025-07-25 10:15:50.839 2025-07-24T22:15:50.839963167 [01K0Z6JEQ0E2TWFP5712F38WPH:fc_api] The API server received a Put request on “/logger” with body “{"log_path":"logs.fifo","level":"info"}”.
2025-07-25 10:15:50.839 2025-07-24T22:15:50.839889975 [01K0Z6JEQ0E2TWFP5712F38WPH:fc_api] API server started.
2025-07-25 10:15:50.839 2025-07-24T22:15:50.839571880 [01K0Z6JEQ0E2TWFP5712F38WPH:main] Listening on API socket (“/fc.sock”).
2025-07-25 10:15:50.839 2025-07-24T22:15:50.839442498 [01K0Z6JEQ0E2TWFP5712F38WPH:main] Running Firecracker v1.12.1
2025-07-25 10:15:50.750 Starting machine
2025-07-25 10:02:13.631 Virtual machine has been suspended

But a response takes over 30 seconds, so it appears to the user that it hangs. Refreshing the page resolves the issue as the machine is awake for the next request.

Here is the response time from Postman:

When I do it from the browser, it seems to never return (stays in the “pending” status indefinitely).

When I manually suspend the machine (as opposed to waiting for it to suspend itself), it seems to wake up and respond correctly.

I have also seen it wake up correctly, but the majority of the time, the response is never returned.

2 Likes

I have tried using auto_stop_machines = true rather than ‘suspend’, and that seems to reliably return from the stopped state. It would obviously be better to have the same behaviour from the suspended state

Hey @paulactually

Could you set flyio-debug: doit header on this request and post fly-request-id value from the response here, please?

Hi @pavel.

I did a request with that header this morning. The fly fly-request-id was: 01K16V7AZ4ZJ1VRBT2NDS0AD7T-syd. There was also the flyio-debug header with the value:

{"n":"edge-cf-syd1-777c","nr":"syd","ra":"125.236.220.56","rf":"Verbatim","sr":"syd","sdc":"syd1","sid":"0801693a191618","st":0,"nrtt":1,"bn":"worker-cf-syd1-519a","mhn":null,"mrtt":null}

Here is the full request and response headers:

Also, the logs for that request:

2025-07-28 09:04:37.576	
machine became reachable in 10.463414ms
2025-07-28 09:04:37.566	
machine started in 212.047468ms
2025-07-28 09:04:37.564	
Machine started in 206ms
2025-07-28 09:04:37.447	
2025-07-27T21:04:37.447628939 [01K16SX00PT0HVST9PY8HNS3YG:fc_api] The API server received a Put request on "/logger" with body "{\"log_path\":\"logs.fifo\",\"level\":\"info\"}".
2025-07-28 09:04:37.445	
2025-07-27T21:04:37.445903630 [01K16SX00PT0HVST9PY8HNS3YG:fc_api] API server started.
2025-07-28 09:04:37.445	
2025-07-27T21:04:37.445614085 [01K16SX00PT0HVST9PY8HNS3YG:main] Listening on API socket ("/fc.sock").
2025-07-28 09:04:37.445	
2025-07-27T21:04:37.445436032 [01K16SX00PT0HVST9PY8HNS3YG:main] Running Firecracker v1.12.1
2025-07-28 09:04:37.353	
Starting machine
2025-07-28 08:48:09.628	
Virtual machine has been suspended

Thanks. Paul.

Hmm, I don’t see anything wrong in our logs.

When you made the request, the proxy woke up the machine and established a new connection to it. It took the app ~30s to respond:

21:04:37.576626000: backhaul -> backend: Request { method: GET, ... }
21:05:08.654473000: backhaul <- backend: Response { status: 200, ... }

Does your app need to talk to some external resource (e.g. a database) to serve such request? If so, it could be that there are connections to the external resource in the pool that are already dead (because the machine was suspended), but it takes a while for the TCP/IP stack/client libraries to realize that once the machine is resumed.

Could you add some logs to make it easier to understand where the app spends the time while serving the request?