Machine unable to start again after "could not reserve resource for machine" failure

nathaniel0 · March 26, 2025, 6:53pm

I noticed an issue this morning where my shared-1x-cpu@512MB container in lhr region failed it’s scheduled invocation. This job is a health check running every 30 minutes which will spin up the service from zero to one instances and run a job.

The last logs that my service returned are below:

2025-03-26 10:38:27.952	[PR04] could not find a good candidate within 20 attempts at load balancing
2025-03-26 10:38:27.951	[PM01] machines API returned an error: "could not reserve resource for machine: insufficient memory available to fulfill request"
2025-03-26 10:38:27.907	Starting machine
2025-03-26 10:38:23.955	[PM01] machines API returned an error: "could not reserve resource for machine: insufficient memory available to fulfill request"
2025-03-26 10:38:23.914	Starting machine
2025-03-26 10:38:19.909	[PM01] machines API returned an error: "could not reserve resource for machine: insufficient memory available to fulfill request"
2025-03-26 10:38:19.867	Starting machine
2025-03-26 10:38:15.891	[PM01] machines API returned an error: "could not reserve resource for machine: insufficient memory available to fulfill request"
2025-03-26 10:38:15.846	Starting machine
2025-03-26 10:38:13.063	[PM01] machines API returned an error: "could not reserve resource for machine: insufficient memory available to fulfill request"
2025-03-26 10:38:13.020	Starting machine
2025-03-26 10:38:12.734	[PM01] machines API returned an error: "could not reserve resource for machine: insufficient memory available to fulfill request"
2025-03-26 10:38:12.676	Starting machine

From this point onward, any attempt to invoke the service returned 503, and there are no more logs available for what was happening behind the scenes.

I assumed this could be related to the incident for FRA (despite being a different region) which was ongoing at that time, but even after this incident’s resolution the container was unable to start.

Has anyone seen any similar behaviour to this, or have any ideas why my containers continued to be unavailable until I released a completely new deployment (with no logs)?

Thanks

gforrest · March 27, 2025, 3:20am

Same thing here, in IAD. Unsure of time period but it seems to have begun today. My last deployment was 3 hours ago, and the problem is ongoing as I write this.

Will attempt another deployment to see if this unsticks it. It would be nice to have a hard reset button in the Dashboard for such occasions. (Edit: manual retrigger of last image build and deploy in CI worked fine).

I use Suspend with a single instance, btw, if that’s relevant. Perhaps there’s a pattern.

gforrest · March 28, 2025, 5:14pm

Not sure if it’s related but my instance failed to come back from Suspended today with no logs at all. Another forced rebuild and deploy ‘unstuck’ it.

system · April 4, 2025, 5:14pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to make sure resources are available to start a machine?	4	360	September 5, 2023
Scheduled Machine not starting -- insufficient memory Questions / Help	3	472	July 18, 2023
Could not reserve resource for machine: insufficient memory available to fulfill request	2	989	August 3, 2023
Update on `could not reserve resource for machine` error machines	1	1040	August 15, 2023
SJC region: could not reserve resource for machine - for 2 to 3 days now Questions / Help	3	909	April 9, 2023

Machine unable to start again after "could not reserve resource for machine" failure

Related topics