Some Machines take 6 minutes to be destroyed after request to destroy VM

jessie · March 14, 2023, 4:34pm

Some Machines take several minutes after I call destroy vm to receive the signal and actually terminate. Is this expected? Should I be calling the wait API to wait for the machine to actually be destroyed? This is causing issues during my deployments which is resulting in different boxes having multiple deployed versions of the code.

My app name: 01gvfwwmq3dfyzsfdyc7hy1ker
Machine ID: 148e42ef7d5189

Backend server logs

cakework-controlplane [GIN] 2023/03/14 - 12:25:23 | 200 |   492.13255ms |    3.230.163.83 | POST     "/v1/vm/148e42ef7d5189/stop

Fly machine logs

2023-03-14 03:31:00.092 [fly] info sjc 7f3c 148e42ef7d5189 01gvfwwmq3dfyzsfdyc7hy1ker [ 4946.714940] reboot: Restarting system
2023-03-14 03:31:00.092 [fly] info sjc 7f3c 148e42ef7d5189 01gvfwwmq3dfyzsfdyc7hy1ker Sending signal SIGKILL to main child process w/ PID 513
2023-03-14 03:31:00.092 [fly] info sjc 7f3c 148e42ef7d5189 01gvfwwmq3dfyzsfdyc7hy1ker Starting clean up.

JP_Phillips · March 14, 2023, 4:46pm

The request to destroy the machine doesn’t look to be delayed, below are the raw events from our system:

    {
      "id": "01GVG1MASGQHDWHZJQQN8XR4ZY",
      "type": "destroy",
      "status": "destroyed",
      "source": "flyd",
      "timestamp": "2023-03-14T12:30:56.816Z",
      "data": {}
    },
    {
      "id": "01GVG1M8N1FZMD8FV87767GZVX",
      "type": "destroy",
      "status": "destroying",
      "source": "user",
      "timestamp": "2023-03-14T12:30:54.625Z",
      "data": {}
    }

And the time between receiving the request to destroy the machine and it being set to destroyed is ~2 seconds. Are the backend server logs also from an App running on Fly.io?

jessie · March 14, 2023, 4:56pm

Yup, that’s correct. I’ve verified that it’s not just a server timestamp clock drift issue because after the stop command was issued, the machine continued processing requests for another few min.

My guess is that it’s because I delete the Fly app instead of calling destroy on the Machine, and that deleting the app doesn’t immediately result in the machines being destroyed until later.

Should I be calling destroy on all the machines before I delete the app?

JP_Phillips · March 14, 2023, 5:19pm

Ah, ok, yes this explains it. Destroying the app goes through our central API which does uses async jobs to process the destroying of resources associated with the App. If you destroy the machines first, the request goes through much quicker.

jessie · March 14, 2023, 5:22pm

Thanks! I updated my service logic. When I call destroy on a machine and the request returns successfully, does that mean the resource has been destroyed successfully? Or do I need to call wait and block until the state actually changes, similar to what I need to do when i create a new machine?

nolan · March 14, 2023, 5:30pm

Should deleting an app in flyctl destroy associated resources first? Or is the better solution to handle this lower in the stack?

JP_Phillips · March 14, 2023, 5:47pm

You can use the wait endpoint to block on the machine being completely destroyed since the DELETE /v1/{app_name}/machines/{machine_id} endpoint returns once it starts the process.

system · March 21, 2023, 5:48pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Machines stuck destroying	8	572	April 17, 2023
Stopping a machine via the CLI actually destroys it	6	505	February 22, 2023
Machines API - Waiting for "destroyed" state returns 404 machines	4	345	March 31, 2023
Fly Machine becoming unresponsive and then stopping without explanation	18	1714	February 6, 2023
autostop machine - virtual machine exited abruptly Questions / Help	5	893	July 11, 2023

Some Machines take 6 minutes to be destroyed after request to destroy VM

Related topics