Machine unable to start again after "could not reserve resource for machine" failure

I noticed an issue this morning where my shared-1x-cpu@512MB container in lhr region failed it’s scheduled invocation. This job is a health check running every 30 minutes which will spin up the service from zero to one instances and run a job.

The last logs that my service returned are below:

2025-03-26 10:38:27.952	[PR04] could not find a good candidate within 20 attempts at load balancing
2025-03-26 10:38:27.951	[PM01] machines API returned an error: "could not reserve resource for machine: insufficient memory available to fulfill request"
2025-03-26 10:38:27.907	Starting machine
2025-03-26 10:38:23.955	[PM01] machines API returned an error: "could not reserve resource for machine: insufficient memory available to fulfill request"
2025-03-26 10:38:23.914	Starting machine
2025-03-26 10:38:19.909	[PM01] machines API returned an error: "could not reserve resource for machine: insufficient memory available to fulfill request"
2025-03-26 10:38:19.867	Starting machine
2025-03-26 10:38:15.891	[PM01] machines API returned an error: "could not reserve resource for machine: insufficient memory available to fulfill request"
2025-03-26 10:38:15.846	Starting machine
2025-03-26 10:38:13.063	[PM01] machines API returned an error: "could not reserve resource for machine: insufficient memory available to fulfill request"
2025-03-26 10:38:13.020	Starting machine
2025-03-26 10:38:12.734	[PM01] machines API returned an error: "could not reserve resource for machine: insufficient memory available to fulfill request"
2025-03-26 10:38:12.676	Starting machine

From this point onward, any attempt to invoke the service returned 503, and there are no more logs available for what was happening behind the scenes.

I assumed this could be related to the incident for FRA (despite being a different region) which was ongoing at that time, but even after this incident’s resolution the container was unable to start.

Has anyone seen any similar behaviour to this, or have any ideas why my containers continued to be unavailable until I released a completely new deployment (with no logs)?

Thanks

1 Like

Same thing here, in IAD. Unsure of time period but it seems to have begun today. My last deployment was 3 hours ago, and the problem is ongoing as I write this.

Will attempt another deployment to see if this unsticks it. It would be nice to have a hard reset button in the Dashboard for such occasions. (Edit: manual retrigger of last image build and deploy in CI worked fine).

I use Suspend with a single instance, btw, if that’s relevant. Perhaps there’s a pattern.

1 Like

Not sure if it’s related but my instance failed to come back from Suspended today with no logs at all. Another forced rebuild and deploy ‘unstuck’ it.