[URGENT] Can't connect to Upstash Redis -- App down

Hey! I need urgent help.

My machines are not being connected to my Upstash Redis. I have restarted them and nothing.

Upstash team doesn’t answer.

Please help! My organization is perhaps and my app is workers-api-production.

Joaquín

We have a few apps using Upstash Redis; one of them is currently affected (prereview-coar-notify). It seems to be constantly reconnecting (uses ioredis).

prereview-sandbox was affected, but has recovered.

(Both apps only run in iad.)

For me, it’s a constantly timeout each time I try to connect.

I am not sure if it’s related but we are seeing some machines in IAD that are unreachable and un-replaceable. So it might not be isolated to Upstash Redis.

I solved it by re-creating all my Redis instances. The old ones are still timing out :sweat:

Hello everyone, there is a single host server in iad which crashed, and so any Machines which were on this particular host server are inaccessible. This looks like a disk failure, so do not expect these Fly Machines to come back online immediately. If needed, spin up new Machines for the time being. You are not being charged for the Machines now offline.

Looks like our prereview-sandbox app has one machine affected by this (I can it in our status page).

It doesn’t explain prereview-coar-notify not being able to connect to Redis, unless the Upstash Redis is running on that host? (There’s two instances in iad which can’t connect to it, and I can’t proxy into it either.)

when trying to create new a machine with fly scale count 1 I am seeing 500 errors. I assume this is because I also have a volume on the same host?

fly scale count 1 -a foo
App 'foo' is going to be scaled according to this plan:
  +1 machines for group 'app' on region 'iad' of size 'shared-cpu-1x'
   1 unattached volumes to be assigned to group 'app' in region 'iad'
? Scale app foo? Yes
Executing scale plan
! WARNING: There are active host issues affecting your app. Please check `fly incidents hosts list` or visit your app in https://fly.io/dashboard

Error: failed to launch VM: server returned a non-200 status code: 500 (Request ID: 01J1WGFTX4B78YK7RQJQ088505-lhr)

I can recreate the data on the volume easily so it’s more important for me to get a working machine online even if it means a fresh volume too. What options do I have for working around the errors?

Yep. It seems like Upstash machines created on IAD are not working. Because I managed to create another machine in IAD and it’s not connecting either…

It looks like you have an unattached volume on that host which died. The Machine placement engine knows that the unattached volume exists, and therefore tries to put your new Machine on that host, which, since it’s down, fails. Create a new volume first, then create a new Machine.

Yes, this is a bad UX. Apologies, we haven’t had time to polish it yet.

Yes, that is what I just tried following the instructions here: Troubleshoot apps when a host is unavailable · Fly Docs

The new volume was created just fine but the machine creation is still failing with a 500 error.

Error: failed to launch VM: server returned a non-200 status code: 500 (Request ID: 01J1WHW4A711V9F1A42X3YK6V7-lhr)

With some help from support I was able to get things running with

fly scale count 0
fly scale count 1 --with-new-volumes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.