we’re running an app in ewr and have this info message in our dashboard. both our development and production apps are completely unreachable. just wondering if there’s any word on this or when it may be resolved?
our logs say: "could not find a good candidate within 90 attempts at load balancing. last error: unreachable worker host. the host may be unhealthy. this is a Fly issue."
If I’m reading correctly, it looks like ewr
was unreachable for a brief moment around half an hour ago. It should be better now. Please try again, and if there’s still issues deploying, I’ll escalate it
still unreachable
We’re seeing it as well. Postgres on EWR has been down for the past 3 hours.
I was not reading correctly, sorry!
A fix is being worked on, I’ll update when this is resolved
Appreciate the response. Is there an ETA?
Also https://status.flyio.net/ does not seem to show the outage, sounds like there’s an issue in that reporting page as well.
We are also experiencing the same on an app with a pg cluster. App has been totally out of action for over 4 hours now.
Can confirm, also hosting something personal on ewr and its been down since this morning.
Ops team is looking into it. I sadly don’t have an ETA to share right now.
We typically report issues that affect the service as a whole on the statuspage, and issues that affect only existing apps on the app’s issues tab. Today’s issue is an unusual one because it’s tied to a single host, but it’s having an outsized effect compared to typical single-host issues. There’s an internal conversation going on right now about how we could better communicate situations like this that don’t neatly fit in our incident categories.
If at all possible, please try to move your apps to other regions temporarily. If not (if, for example, you use volumes), we’re really trying to get this back up in a timely manner. I’ll update again when there’s more to share
The issue should be cleared, please let us know if there are any more issues!
It has unfortunatly left one of our postgres clusters is a broken state.
Same here. Postgres is in a broken state.
Manually restarted the app with flyctl machine start
and it seemed to improve.
@allison The postgres log is full of error configuring operator user: can't scan into dest[3]: cannot scan null into *string
which was not the case prior to this outage.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.