Hello,
I just noticed that my “worker” machine has gone in a “zombie” status 3 days ago, and I wasn’t aware of it.
When I run fly machine list I see the machine with ID e784362a436998 in started state, with no image and with creation date 1970-01-01T00:00:00Z.
I just run a new fly deploy and a new machine was created. Everything works again but I can’t remove this old machine… Am I currently being charged for this machine?
I would like to understand why this happened, and how to prevent it from happening again.
I took a look at your account and the server which was hosting that Fly Machine went up in flames and is never coming back. So you’re getting an error because flyctl can’t connect with that server to tell it to destroy your machine.
We’re going to be destroy those Fly Machines from the backend in a little while, so you should see it gone soon.
We checked our billing system and confirmed that immediately from the start of the downtime, the system should NOT have allowed you to accrue any more charges for that Machine. And if when you get your bill for this month you see that a change somehow slipped through anyway, please email billing@fly.io to ask for a refund.
Hope this helps, I’ll be around to take any more questions you have.
If you log in to the dashboard on the Fly Web UI, you should see that there’s a banner up at the top which, in this case, reads
A server hosting some of your apps has suffered irreparable hardware damage. Please migrate your Fly Machines to other hosts and restore volumes from any backups.
That’s been there since this server broke. Any issues which don’t affect the Fly Platform as a whole but may impact individual user’s apps will be published in this fashion.
Is that what you’re looking for about notification?
I think I might have the same problem. I have this status update:
2024-04-01 19:56:16 UTC A server hosting some of your apps has suffered irreparable hardware damage. Please migrate your Fly Machines to other hosts and restore volumes from any backups.
As it happens the app was fine, but perhaps it was booted automatically elsewhere, and I did not notice downtime.
So I did the scale-to-0 and scale-to-2 trick, and now I have three machines:
flyctl machines list
3 machines have been retrieved from app brumstack.
View them in the UI here (https://fly.io/apps/brumstack/machines/)
brumstack
ID NAME STATE REGION IMAGE IP ADDRESS VOLUME CREATED LAST UPDATED APP PLATFORM PROCESS GROUP SIZE
2871753a0e1228 billowing-sound-326 started lhr brumstack: fdaa:5:f9ca:a7b:19:8885:35cf:2 2024-04-06T15:22:18Z 2024-04-06T15:22:23Z v2 app shared-cpu-1x:256MB
7843d29a23d728 weathered-pine-2382 started lhr brumstack: fdaa:5:f9ca:a7b:19:cee0:72b1:2 2024-04-06T15:22:18Z 2024-04-06T15:22:23Z v2 app shared-cpu-1x:256MB
4d8902da439d18 divine-violet-9658 started : 1970-01-01T00:00:00Z 0001-01-01T00:00:00Z
I can neither stop nor kill the last one. If I try to list machines in the GUI, then I get:
There was an error loading machines
I’d expect things to be rather more robust than this in the case of failure. Now I can destroy the app and recreate it, but I wonder if it is better that I am raising it here, so that the pain point can be identified. Specifically users need to be able to list or remote machines even if some are dead.
If you had two Fly Machines running before, then they would have been on separate hosts, so when one host server died, the platform would just route all traffic to the remaining Fly Machine. This one of the central designs of Fly Platform, and why we encourage everyone to run multiple smaller Machines instead of one big one.
Yes, that last Machine was the one on the dead host. This is actually the first time this type of hardware failure has happened, so it’s surfacing some bugs that we’re now addressing, and that’s why you can’t destroy this Machine. But we’ve double-checked that you are not being charged for the ghost Machine and have not been since the server failed. (And in the unlikely event that we’ve double-checked wrong and there are extra charges on your statement for this month, please email billing@fly.io and we’ll fix it.)
Correct, at present you cannot, but this was already a feature we already had planned, and it’s now been bumped up in priority.