App stuck in pending state after restarts, rescaling

mtlynch · July 12, 2022, 2:11pm

My app is suddenly down today, and I’m not sure why. I didn’t initiate a new deploy or touch anything on the server in the last few days.

It looks like at 2022-07-12T12:42:20.824 something told my server to shut down, and it seems to be stuck on the step of Umounting /dev/vdc from /data .

2022-07-12T12:42:20.824 runner[bca5e320] iad [info] Shutting down virtual machine
2022-07-12T12:42:20.829 app[bca5e320] iad [info] Sending signal SIGINT to main child process w/ PID 524
2022-07-12T12:42:20.829 app[bca5e320] iad [info] signal received, litestream shutting down
2022-07-12T12:42:20.830 app[bca5e320] iad [info] sending signal to exec process
2022-07-12T12:42:20.830 app[bca5e320] iad [info] waiting for exec process to close
2022-07-12T12:42:20.831 app[bca5e320] iad [info] litestream shut down
2022-07-12T12:42:21.831 app[bca5e320] iad [info] Main child exited normally with code: 0
2022-07-12T12:42:21.831 app[bca5e320] iad [info] Starting clean up.
2022-07-12T12:42:21.844 app[bca5e320] iad [info] Umounting /dev/vdc from /data

$ fly status --all 
App
  Name     = picoshare          
  Owner    = personal           
  Version  = 273                
  Status   = pending            
  Hostname = picoshare.fly.dev  

Deployment Status
  ID          = 569d557d-cb05-a5f0-1dae-c04189e3f4e2         
  Version     = v273                                         
  Status      = running                                      
  Description = Deployment is running                        
  Instances   = 1 desired, 0 placed, 0 healthy, 0 unhealthy  

Instances
ID              PROCESS VERSION REGION  DESIRED STATUS          HEALTH CHECKS           RESTARTS        CREATED              
bca5e320        app     273     iad     evict   complete        1 total, 1 passing      20              2022-06-12T11:55:10Z

I tried restarting the app: no change.

I tried scaling the app (changing the RAM allocation): no change.

I can see the new releases in the Fly dashboard, but the logs don’t update at all, and fly status --all has the same output.

mtlynch · July 12, 2022, 2:17pm

This seems to be something about my app in particular. I have a different version of the same app running in a different Fly account, and it’s stuck in the same state:

https://tinypilot-pico.fly.dev/

Both instances mount a 3 GB persistent volume in iad, so I’m wondering if there’s some issue with that DC.

mtlynch · July 12, 2022, 4:51pm

Based on this comment, I was able to get up and running again by scaling to a dedicated CPU.

If I scale back down to a shared CPU, I get stuck in pending state again.

jerome · July 12, 2022, 5:26pm

You should be able to scale back down now. We have freed some space on the server where your volume is.

mtlynch · July 12, 2022, 6:33pm

Thanks, confirmed.

Is there anything I can do to avoid this in the future short of always running with a dedicated CPU?

kurt · July 12, 2022, 6:37pm

This is a bug in our infrastructure. Your VM got stopped when a particular host had capacity issues. Since your volume was on that one exact host, you couldn’t boot a new VM.

Switching to a dedicated CPU actually evicts other VMs on that host running shared-cpus. It seemed like a good idea when we initially built this, but now I believe we’re better off making new VMs fail. It’ll take us some time to redo this plumbing, but it’s a high priority.

The “real” answer in our infrastructure is to run >=2 VMs for max redundancy. Which obviously doesn’t work (yet) with sqlite.

mtlynch · July 12, 2022, 7:53pm

Gotcha, thanks, @kurt!

My apologies to whomever I evicted.

I knew this was Ben Johnson’s fault!

kurt · July 12, 2022, 7:54pm

It is, litestream will solve every problem anyone’s ever had.

RyanOfWoods · February 11, 2023, 1:58pm

I have the same problem with my swedishbirds-db database and 0a7e02c3 instance. No way to force a restart

Topic		Replies	Views
Can't deploy my app in iad	7	498	October 9, 2022
Deployment started failing and the app is down.	3	233	December 21, 2022
App stuck in pending state	9	2976	April 10, 2022
Deploys stuck in `pending` Questions / Help	4	477	February 3, 2023
New apps started up today (Jan 21) stuck in pending Questions / Help	0	389	January 21, 2022

App stuck in pending state after restarts, rescaling

Related topics