upstash redis timeouts

I’d assume most people wouldn’t start there so it would be OK with redis self-hosted being the default? Or, if not that, at least a documented way how to do it and maybe a pro/contra list. WDYT?

Be aware that there are people actively working to make Upstart Redis a better match for frameworks like Rails which have need for queuing. So any “contras” are treated as work items. Upstash Redis will continue to improve, and hopefully when you need more than a single vm solution can provide, switching should easy.

1 Like

That’s great to hear! Let me know if I can help in any way.

I packaged up our Redis patch at Getting Started · Fly Docs and released it as the actioncable_redis-reconnect

This means from the root of your rails app, you can add it to your project and it will put the correct patches into place.

$ bundle add actioncable_redis-reconnect

Try that, remove the original patch if you put that into place, redeploy, and let me know if the issues persist.

I was able to get everything up and running today. Now I just need to do some fine tuning and plan migration steps. Thanks!

Did you accomplish that with the actioncable_redis-reconnect gem? If so and it worked for you I’ll update our Getting Started Rails docs to tell folks to use the gem.

I meant I was just able to actually set up two different upstash instances and connect to both. I am not running in production, yet, and have not encountered disconnects so I have not tried the reconnect gem. I am currently debating whether to self-host redis or go with upstash and hope I don’t encounter these connection issues (or that the gem fixes it)

Digging in to this thread more, it sounds like the issues with upstash & Rails are with ActionCable use. We use redis for Sidekiq and cache, not yet ActionCable, so planning to run with upstash, for now.

In the case of Sidekiq, there are still two issues that you may run into, though neither stop your jobs for processing.

One is that idle connection timeouts will still show up as Connection reset by peer errors. Sidekiq will reconnect.

Another is Sidekiq 7’s new metrics features won’t work without the BITFIELD command being added to Upstash.

The good news is that these should be fixed shortly and no patches will be needed. Meanwhile, you can still move ahead with using Sidekiq.

Thanks. I have not yet updated to Sidekiq 7 (though running it embedded is a tempting an option) and I can live with the timeouts as long as the jobs retry.

We have an update here.

ActionCable was patched to reconnect without crashing. Also, the idle connection timeout to Upstash Redis was raised to 1 hour. So you should see it way less frequently, if at all, depending on how active Sidekiq gets.

1 Like

It appears the idle connection timeout was just lowered back down under 10 minutes, based on this graph coming from a single-process single-VM app.

Edit to add: the last deploy for this app was Dec 6, and the last non-dependency-update code change for this app was Sept 21. I’m extremely sure this isn’t the result of something I did.

Thanks for the info. Are these timeouts coming from ActionCable, Sidekiq or something else?

The exceptions are coming from inside ActionCable.

OK, thanks. I haven’t been able to reproduce this just yet. But, for now, would you be able to install this gem? GitHub - anycable/action-cable-redis-backport.

Looks like Upstash fixed whatever was going on? It went away after two days with no action on my part:

This could also have been an issue with our proxy. Upstash actually removed timeouts completely, but our proxy times out idle connections. That’s what was bumped to 1 hour. I haven’t confirmed either way if that’s what happened, yet.

However I do recommend using the gem, as its behavior is now in Rails. It ensures that subscriptions stay alive while reconnect to Redis happens.

I’ll give it a shot. Thanks!

I’m suddenly experiencing timeouts on all my redis instances across all my Fly orgs.

When connected to any redis instance and checking metrics, I receive Error: Server closed the connection.

Seems to be an Upstash and/or fly proxy issue as mentioned above.

I’ll try running through the solves listed in this thread, I have already implemented the Actioncable patch shown in the docs and the actioncable backport.

I’ve hit the same issue yesterday after heavy testing, so thought it had something to do with a free plan limits (it says up to 10k commands daily). But im getting timeouts even today with literally zero connections (because nothing can connect, doh!)

Deployed redis on fly and it does the job + has lower networking latency.