Application VMs down without any change, can't deploy

My apps are all of a sudden not responding at all, looks like they’ve been down for about 4 hours (I haven’t touched them since Friday when they were working fine). My error monitor shows nothing, and the fly dashboard says they are pending.

If I try to redeploy them I get stuck on Running release task (pending)....

Any direction on what I can try would be massively helpful, I have customers that are unable to use our platform right now. (I have also emailed my dedicated support but not heard anything)

Last output of fly logs:

2022-10-03T05:05:34Z app[54e4b028] ewr [info]05:05:34.385 request_id=[REDACTED]  [info] Sent 200 in 919µs
2022-10-03T05:07:42Z runner[54e4b028] ewr [info]Shutting down virtual machine
2022-10-03T05:07:42Z app[54e4b028] ewr [info]Sending signal SIGTERM to main child process w/ PID 516

Umm. What?

Hey there!

Is that log output coming from the release task, or is that the last logs showing when the app was shutdown (before you attempted to re-deploy?)

Can you run fly status --all from wherever your fly.toml lives?

We’re also unable to deploy atm - our container is stuck in ‘Pending’ when trying to restart/redeploy

I see someone else had a similar issue but it was resolved (pre-release command ran as expected). LMK if you see that on your end as well.

Ok further info:

This is ONLY happening for me on the EWR region. If I do any other region my app deploys fine. It looks to me like there’s some sort of outage in part of the EWR region?

My database in EWR is doing fine for now, although I’m sweating bullets over here…

1 Like

We’re EWR also, and seeing:

Yes it was!

I added another region and the app is now live but for posterity here’s the output:

App
  Name     = rsmbl-[redacted]       
  Owner    = ressemble                   
  Version  = 557                         
  Status   = running                     
  Hostname = [url-that-doesn't-need-to-be-public-lol].fly.dev  
  Platform = nomad                       

Deployment Status
  ID          = 4f2603b9-0e08-05cb-f457-f58898cd7f0c         
  Version     = v557                                         
  Status      = successful                                   
  Description = Deployment completed successfully            
  Instances   = 1 desired, 1 placed, 1 healthy, 0 unhealthy  

Instances
ID              PROCESS VERSION REGION  DESIRED STATUS          HEALTH CHECKS           RESTARTS        CREATED    
a4358804        app     557 ⇡   iad     run     running         1 total, 1 passing      0               3m28s ago 
31486f69        app     555     ewr     evict   complete                                0               12m2s ago 
a01680c5        app     554     ewr     evict   complete                                0               9h2m ago  
9eca3da8        app     554     ewr     evict   complete                                0               9h8m ago  
64127271        app     554     ewr     evict   complete        1 total, 1 passing      0               22h37m ago
2 Likes

Yep - changed region and we’re back in business!

We’re checking out EWR, thanks!

I’m seeing this in my app as well (also in EWR). I can’t redeploy and it’s stuck in pending:

$ fly status --all
App
  Name     = whatgotdone          
  Owner    = personal             
  Version  = 166                  
  Status   = pending              
  Hostname = whatgotdone.fly.dev  

Deployment Status
  ID          = 83c510bf-3b7b-64b4-9eb6-8f5e9a2db012         
  Version     = v166                                         
  Status      = running                                      
  Description = Deployment is running                        
  Instances   = 1 desired, 0 placed, 0 healthy, 0 unhealthy  

Instances
ID       PROCESS VERSION REGION DESIRED STATUS   HEALTH CHECKS      RESTARTS CREATED              
890943ac app     165     ewr    evict   complete 1 total, 1 passing 0        1h40m ago            
b3aa6eb6 app     165     ewr    evict   complete 1 total, 1 passing 0        2022-09-23T02:38:54Z 

We’re seeing this in EWR as well. Might be worth updating the status page?

Any update here? Our primary db is in EWR, and still running, but the increased roundtrip latency is not great right now between our app and db when running write heavy loads.

Also, @kurt I love fly but I’m taking a ton of heat from repeated outages where the status page never gets updated or gets updated very late so it’s making me look lost. I know it’s probably not malicious on your end but it feels like a trend now where the status page just isn’t up to speed, it’s been over 12 hours since my apps in EWR started crying for help.

I’m going to update the status page right now. The reason you’ve seen us waffling about this is that EWR isn’t really having an outage so much as that it’s been at/near capacity (we have some customers that really hammer it with hundreds of apps). We’re done waffling, a problem is a problem, we’ll keep you posted.

Ok thank you :pray:

Do you think this is going to be a multi day thing or do you have confidence it should be resolved today?

(The reason I’m asking is if it’s going to be longer I might just need to start migrating our db over to some other region as well ^^)

Thank you!

It should be improving now. We’ve brought new servers online and adjusted some customer load issues.

2 Likes

Looking much better from we’re I’m sitting