Is EWR down?

heimann · June 8, 2022, 6:26pm

I have two apps in EWR that both are stuck in status pending all of a sudden after working fine just an hour ago.

To be clear: one of them was up and running, no deployment or changes and now Instances lists no apps and status is pending. Trying to deploy to either app gets stuck on “Running release task (pending)…”

sethinnyc · June 8, 2022, 6:27pm

Same here. Contacted the support email but was directed back here. I have 3 unrelated apps all stuck in pending now. There needs to be something other than a forum for issues like this.

I understand that the forum helps Fly handle support at scale, but an outage/service issue is not something the community can help with.

cubismod · June 8, 2022, 6:30pm

Yep 2 of mine have been down since 1:46PM EST.

rculver · June 8, 2022, 6:31pm

Mine has been down since about 1:30pm EST

heimann · June 8, 2022, 6:33pm

Looks like the status page just updated:

wjordan · June 8, 2022, 8:45pm

Everything should be resolved now, please let us know if you experience any other issues.

We encountered some disk-capacity issues in EWR, and the work we did to resolve them triggered a few unexpected surprises in our Nomad-based instance scheduler. Some instances were interrupted and remained in a pending state for a while (particularly volume-attached or single-region instances that couldn’t be placed elsewhere). Sorry for the interruptions! We’ll be investigating the surprises we encountered to prevent this kind of issue from occurring again in the future.

sethinnyc · June 9, 2022, 12:02am

Thanks for the details, and update.

Were you all aware of the issues before customers reported it? I checked the status page around 1:30 ET, and all systems were operational. It would be great if you were able to more readily update that page, or provide an alert. I have monitoring in the app that gave me a heads up, but I spent a bunch of time trying to fix it on my end since the cause was unclear.

In addition to in-house monitoring, is there a way to report an outage to you all?

wjordan · June 9, 2022, 1:16am

Yes, we were aware the issue soon after it began around 1:30 ET, though the ongoing impact it was having on some applications wasn’t fully clear until the first customer reports arrived. Your initial report (around 2:20 ET) helped us confirm the impact was more severe than we initially thought based on our metrics, and we updated the status page 12 minutes later.
We’re working on fixing the bug that caused this unexpected issue, as well as adding more thorough monitoring to more quickly gauge severity for this particular type of incident. Customer reports will always be helpful though, and this forum is usually the quickest way to get our attention.

stephenb · June 9, 2022, 2:36pm

Could this have caused performance issues in DFW as well?

rahmatjunaid · June 9, 2022, 3:01pm

Hi @stephenb, do you have apps deployed in DFW and are you seeing similar issues currently in DFW?

Can you please post the error messages you’re seeing?

stephenb · June 9, 2022, 3:04pm

well, I’m not totally clear on the impact experienced in EWR… But, while this issue was happening yesterday I was trying to debug unusually slow performance in a PG instance in DFW. The timing just makes me curious.

rahmatjunaid · June 9, 2022, 3:20pm

Ah I understand but capacity issues in ewr shouldn’t affect dfw so the two are not related.

Topic		Replies	Views
Application VMs down without any change, can't deploy Phoenix	16	1307	October 3, 2022
app is currently down for maintenance	14	536	June 22, 2023
Something went wrong? Questions / Help	42	1426	September 22, 2022
Fly.io machine is down again - another incident? builders	15	335	November 5, 2024
App status shows pending on dashboard elixir	2	389	July 19, 2022

Is EWR down?

Related topics