Thanks. I have not yet updated to Sidekiq 7 (though running it embedded is a tempting an option) and I can live with the timeouts as long as the jobs retry.
We have an update here.
ActionCable was patched to reconnect without crashing. Also, the idle connection timeout to Upstash Redis was raised to 1 hour. So you should see it way less frequently, if at all, depending on how active Sidekiq gets.
It appears the idle connection timeout was just lowered back down under 10 minutes, based on this graph coming from a single-process single-VM app.
Edit to add: the last deploy for this app was Dec 6, and the last non-dependency-update code change for this app was Sept 21. I’m extremely sure this isn’t the result of something I did.
Thanks for the info. Are these timeouts coming from ActionCable, Sidekiq or something else?
The exceptions are coming from inside ActionCable.
OK, thanks. I haven’t been able to reproduce this just yet. But, for now, would you be able to install this gem? GitHub - anycable/action-cable-redis-backport.
Looks like Upstash fixed whatever was going on? It went away after two days with no action on my part:
This could also have been an issue with our proxy. Upstash actually removed timeouts completely, but our proxy times out idle connections. That’s what was bumped to 1 hour. I haven’t confirmed either way if that’s what happened, yet.
However I do recommend using the gem, as its behavior is now in Rails. It ensures that subscriptions stay alive while reconnect to Redis happens.
I’ll give it a shot. Thanks!
I’m suddenly experiencing timeouts on all my redis instances across all my Fly orgs.
When connected to any redis instance and checking metrics, I receive
Error: Server closed the connection.
Seems to be an Upstash and/or fly proxy issue as mentioned above.
I’ll try running through the solves listed in this thread, I have already implemented the Actioncable patch shown in the docs and the actioncable backport.
I’ve hit the same issue yesterday after heavy testing, so thought it had something to do with a free plan limits (it says up to 10k commands daily). But im getting timeouts even today with literally zero connections (because nothing can connect, doh!)
Deployed redis on fly and it does the job + has lower networking latency.
Hello! Thanks for you help!
I just deployed a new Rails application and I started to see the same exact error. Found your post and installed the gem, but feels like didn’t help.
Should I create a complete new instance with a dedicated Redis installation or there is other workaround I can look?
I also gave up and deployed keydb to fly as a separate app instead of using upstash. Haven’t seen the issue since.
Hey @tello , when did you experience this problem? Can you share your redis endpoint (without credentials) so we can check the health of the your DB in Upstash side?
@bi1 I would say since yesterday
Private URL = redis://default:TOKEN@fly-lifi-redis.upstash.io
I will say that I’m not seeing timeouts with a simple application. I’d be curious if you could reproduce my results and if so, identify what might be different about your application.
mkdir demo cd demo curl https://fly.io/docs/rails/cookbooks/databases/Dockerfile1 -o Dockerfile fly launch
Accept all of the defaults (you don’t need a postgres database, and you don’t need an additional redis instance).
fly secrets set REDIS_URL=redis://default:TOKEN@fly-lifi-redis.upstash.io`
(with your TOKEN of course)
fly deploy fly open fly logs
If you open a second browser window you can see updates being pushed in realtime using web sockets.
I haven’t let it go idle for a full hour, but if there is no activity for an hour the socket will be closed and that will take an unpatched Rails down.
Not sure if this helps, but this timeout thing improved in the past days. Still this is an issue every now and then.
Not sure if I can help in any way to completely dismiss this or is something I need to live with at least for some time. Any advise is appreciated. @bi1
Do you see any pattern for the connection issues?
Like is it happening randomly or with same interval like every hour etc?
Thanks for your answer @bi1. Seems like it happens randomly. Not sure if this helps in some way, but most of the errors come from background jobs with Sidekiq. I’m executing around 10K jobs daily.
Can you share with me latest timestamp that happened so we can specifically check for that date?