I believe it is a connection/latency problem. I’ve just had Redis timeouts (set to 2s) when connecting through the replica in fra. A deployment happened, and the fresh connection worked fine.
Not long beforehand, the other app instance in iad successfully restarted its connection according to my app’s logs.
After thinking I’d solved my similar problem in the past I’m suddenly getting it again now as well.
I’m seeing lots of Redis connection timeout errors and my app isn’t successfully getting anything out of Redis. I haven’t changed anything relevant for quite a while so don’t believe I’ve broken it.
@kurt I can’t speak for the others but for me I think it is the latter (i.e. timeouts after connection alive for a while). If this latest instance of it is same as before for me then it works for a while after a deploy (like it does for @thewilkybarkid) and then starts failing after some period (not sure how long, but hours not minutes). It had been fine for me after I tweaked my Redis connection settings about 2 weeks ago, but today I’m getting problems again which I assumed might be the same as the others in this thread.
I understand it might have to reconnect if a connection has been open too long, but it seems like that re-connection might be failing. I’ve just bumped up connect and command timeouts a little and redeployed. It’s working again now so I’ll wait and see what happens.
It’s the latter for me. With half a day’s worth of logs (low traffic though), I can see the connection is closed every so often and ioredis reconnects. Other times, however, the connection remains open but is unusable (i.e. commands time out).
I might add a ping to my app’s health check to see if that helps. (It’ll be called far more often than should be needed, but it would keep it active.)
ioredis might have a ping function. I think the hangs are concerning, but I’m guessing those happen because ioredis hasn’t detected the closed connection. If you can get a command to traverse the connection every ten seconds or so, I think it’ll be fine.
I am also seeing this issue and using ioredis with fly / upstash redis. My ioredis config is dead simple const redis = new Redis(process.env.REDIS_URL). I did add ?family=6 to the end of the connection string printed to my terminal after I created the redis instance in order for the initial connection to work, but now after after a stretch of no traffic I cannot connect anymore.
We’ve bumped the Redis idle timeout to a day (instead of one hour) which should help with debugging. If you’re seeing timeouts throughout a single day after a deployment, do report back here.
Will need to check; perhaps the error is on the rails side when sending a job. In either case moved to self hosting as it was causing too much of an issue.
After 24 hours of the ping being in our health check, we had no reconnections or problems. (The idle timeout looks to have been changed a few hours after, it was fine before then though.)
I’ve opened an ioredis issue since there are a few things there:
I don’t know if there are any details that others could provide; please do if you can!
I tried a variety of things with my ioredis settings but continued to have re-connection problems. This seems to be a recurring theme in ioredis GitHub issues and various other discussions online. Interestingly I also discovered that BullMQ (I’m using older Bull) explicitly checks if you are using Upstash and throws an error immediately if so as it doesn’t support them. In short… it’s all a bit of a confusing muddle.
So… I have put a ping to my Redis connection into a healthcheck call like thewikybarkid did. Since doing that it’s been OK, but it’s only been 24hrs so fingers crossed still.