A zombie machine appears

ignoramous · October 25, 2022, 2:17pm

I deploy 40+ machines in a single app (udns). Today, I noticed that one of them has gone missing (without my intervention). I don’t know exactly when, but it is worrying that machines that are active may go missing like this.

# the zombie machine in question:
24d891dec4d687	udns-yyz 	stopped	   :	1970-01-01T00:00:00Z	0001-01-01T00:00:00Z

This is the second time that it has happened (previously, it was in vin which is not explicitly supported for machines, so that was okay), I thought I’d let Fly engs know that there’s some latent bug lurking which possibly could have dire consequences, esp for Fly-automated Postgres v2.

cc: @JP_Phillips

JP_Phillips · October 25, 2022, 2:47pm

The underlying host that machine was on has been decommissioned. We’ll get the machine record updated to reflect this.

ignoramous · October 25, 2022, 3:09pm

Thanks. So future decommissions could cause zombies? Or, is the root-cause being addressed? I ask because I’d want to factor this for before I begin to move all our prod traffic to Fly.

I don’t know if the issue I am seeing is related, but I can’t deploy newer image to any machine anymore (except the ones in maa). I suspect some lease or the other is what’s blocking the deploy to udns? If that’s indeed the case, how can I make Fly relinquish those leases?

ignoramous · October 25, 2022, 3:17pm

I can see that the newer machines that I clone in udns are also stuck in created and never transition to started/stopped.

e148e452addd89	udns-yyz3	created	yyz   	udns:deployment-01GE3134FAY3X1AS8F76DZBZ16	fdaa:0:35f3:a7b:88dc:eba1:c3e1:2	      	2022-10-25T15:14:52Z	2022-10-25T15:14:52Z

JP_Phillips · October 26, 2022, 4:08pm

Sorry it took a bit to get things cleaned up but you should no longer see machine 24d891dec4d687. And as for machine e148e452addd89, it did eventually start after we resolved an issue with our registry in yyz, incident.

ignoramous · November 6, 2022, 5:24pm

Oh wow, a machine in jnb (73d8d1d7a9d891) that went full zombie (presumably due to some incident or the other) a few days ago has automagically recovered! I needn’t monitor zombies anymore then?

Topic		Replies	Views
I have a zombie machine and can't kill it Questions / Help	8	303	April 15, 2024
Machines error blocking deploys: No responders available for request Questions / Help help-me-help-you	0	409	November 25, 2022
Machine not found, unable to start or restart machines	11	113	December 3, 2024
Machines still not available? machines	7	65	July 24, 2025
I seem to have machines that are zombies	3	215	April 18, 2024

A zombie machine appears

Related topics