machine became unresponsive, now impossible to stop/start/kill

peralta · February 2, 2025, 6:50pm

At 12.40pm CET I started to get alerts that my fly machine was down (from my external prober testing my own API endpoints).
I tried to redeploy, which used to fix the issue in the past. No luck.
Tried the following with the CLI:
Stop, Start, Restart, Kill. No luck.
Now the status in fly machines list shows replacing:

ID            	NAME                 	STATE    	CHECKS	REGION	ROLE	IMAGE                                                	IP ADDRESS                     	VOLUME              	CREATED             	LAST UPDATED        	PROCESS GROUP	SIZE                
080e693b674778	restless-feather-2130	replacing	      	iad   	    	late-glade-7454:deployment-01HTG9M5HNKA69ZR3PN0FFMHV3	fdaa:2:5e4d:a7b:107:3599:e667:2	vol_8l524yjg75347zmp	2023-06-16T21:07:03Z	2025-02-02T18:23:38Z	app          	shared-cpu-1x:256MB```

The web logs for the machine show the following:

replacing	update	user	February 2, 2025 6:23PM
starting	start	flyd	February 2, 2025 1:19PM
stopped	exit	flyd	February 2, 2025 1:19PM	exit_code=0,oom_killed=false,requested_stop=false


Any tips on how to stop an unstoppable machine?
Thank you

mabis · February 2, 2025, 7:07pm

if getting the system up is the most important thing I would spin up a new machine before wasting time on the old one.

In fact I’d try and scale the curret app up and if it works I’m good and pressure is off e.g. fly scale count 2

then, after that’s done and there is no pressure, I’d wait and then try to kill the old machine again, keep an eye on it to see if it gets unstuck, contact Fly eventually

peralta · February 2, 2025, 10:34pm

Thank you. Creating a new volume from the most recent snapshot and attaching it to a new machine allowed me to get the service back on track. The old machine is still stuck in “replacing” though.

system · February 9, 2025, 10:35pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
New build - machine stuck Build debugging machines	4	220	January 20, 2024
Machine stuck in "replacing" state machines	12	143	March 4, 2025
Machine not found, unable to start or restart machines	11	118	December 3, 2024
Service Interruption: Can't Destroy Machine, Deploy, or Restart Questions / Help rails	28	4725	July 28, 2023
App down, machines still not starting after incident Questions / Help	1	81	April 19, 2024

machine became unresponsive, now impossible to stop/start/kill

Related topics