We’re seeing machine creation only take ~5 seconds. However, when the fly proxy restarts an existing stopped machine we’re seeing this take upwards of 30 seconds. Seeing lots of logs like this in grafana:
2025-09-02 14:00:40.086
Machine started in 34.603s
2025-09-02 14:00:26.810
[PM01] machines API returned an error: "machine still attempting to start"
2025-09-02 14:00:26.808
Starting machine
2025-09-02 14:00:24.732
[PM01] machines API returned an error: "machine still attempting to start"
2025-09-02 14:00:24.730
Starting machine
2025-09-02 14:00:23.656
[PM01] machines API returned an error: "machine still attempting to start"
2025-09-02 14:00:23.656
[PM01] machines API returned an error: "machine still attempting to start"
2025-09-02 14:00:23.654
Starting machine
2025-09-02 14:00:23.654
Starting machine
2025-09-02 14:00:22.608
[PR03] could not find a good candidate within 1 attempts at load balancing. last error: [PM01] machines API returned an error: "machine still attempting to start"
2025-09-02 14:00:22.608
[PR03] could not find a good candidate within 1 attempts at load balancing. last error: [PM01] machines API returned an error: "machine still attempting to start"
2025-09-02 14:00:22.579
Starting machine
2025-09-02 14:00:22.579
Starting machine
2025-09-02 14:00:21.511
[PM01] machines API returned an error: "machine still attempting to start"
2025-09-02 14:00:21.511
[PM01] machines API returned an error: "machine still attempting to start"
2025-09-02 14:00:21.508
Starting machine
2025-09-02 14:00:21.508
Starting machine
2025-09-02 14:00:19.517
[PM01] machines API returned an error: "machine still attempting to start"
2025-09-02 14:00:19.515
Starting machine
2025-09-02 14:00:18.444
[PM01] machines API returned an error: "machine still attempting to start"
2025-09-02 14:00:18.441
Starting machine
2025-09-02 14:00:17.367
[PM01] machines API returned an error: "machine still attempting to start"
2025-09-02 14:00:17.365
Starting machine
2025-09-02 14:00:16.317
[PR03] could not find a good candidate within 1 attempts at load balancing. last error: [PM01] machines API returned an error: "machine still attempting to start"
2025-09-02 14:00:16.289
Starting machine
2025-09-02 14:00:14.242
[PR03] could not find a good candidate within 1 attempts at load balancing. last error: [PM01] machines API returned an error: "machine still attempting to start"
2025-09-02 14:00:14.213
Starting machine
2025-09-02 14:00:13.219
[PM01] machines API returned an error: "machine still attempting to start"
2025-09-02 14:00:13.216
Starting machine
2025-09-02 14:00:12.133
[PM01] machines API returned an error: "machine still attempting to start"
2025-09-02 14:00:12.131
Starting machine
2025-09-02 14:00:11.494
[PM01] machines API returned an error: "rate limit exceeded"
2025-09-02 14:00:11.493
Starting machine
2025-09-02 14:00:10.922
[PR03] could not find a good candidate within 1 attempts at load balancing. last error: [PM01] machines API returned an error: "machine still attempting to start"
2025-09-02 14:00:10.898
Starting machine
2025-09-02 14:00:10.681
[PM01] machines API returned an error: "machine still attempting to start"
2025-09-02 14:00:10.679
Starting machine
2025-09-02 14:00:10.577
[PM01] machines API returned an error: "machine still attempting to start"
2025-09-02 14:00:10.575
Starting machine
2025-09-02 14:00:10.477
[PM03] could not wake up machine due to a timeout requesting from the machines API
2025-09-02 14:00:05.615
2025-09-02T21:00:05.615887739 [01K3YVTDQ28NQ63F8932MHGJCE:fc_api] The API server received a Put request on "/logger" with body "{\"log_path\":\"logs.fifo\",\"level\":\"info\"}".
2025-09-02 14:00:05.615
2025-09-02T21:00:05.615694847 [01K3YVTDQ28NQ63F8932MHGJCE:fc_api] API server started.
2025-09-02 14:00:05.614
2025-09-02T21:00:05.614453697 [01K3YVTDQ28NQ63F8932MHGJCE:main] Listening on API socket ("/fc.sock").
2025-09-02 14:00:05.614
2025-09-02T21:00:05.614294456 [01K3YVTDQ28NQ63F8932MHGJCE:main] Running Firecracker v1.12.1
2025-09-02 14:00:05.476
Starting machine
Any ideas what could be going on? Thanks!