Our app (‘prereview’) is currently failing on requests that use sessions, which we’ve recently switched to being stored in Upstash Redis.
The requests give a 502 Bad Gateway response, and the logs contain the message ‘could not send HTTP request to instance: connection error: timed out’. I can see in the logs that the request has been sent to the app and it’s processing it; we don’t have any logging around the Redis connection though so I’m wondering if that’s gone (very) slow. I can’t see any useful details on the Upstash dashboard though. (Nor the Fly/Upstash status pages.)
(The logs show issues between 13:10 and 13:36 UTC today, 17 Jan.)
Are there any known issues with Upstash Redis going slow?
What runtime/framework/client do you use? In which region are you seeing timeouts? It seems odd you wouldn’t see any logs from your Redis client unless you have a node app, which can sometimes swallow errors.
It’s Node, using Keyv. This uses ioredis under the hood.
The app is deployed in iad and fra; the Redis primary is iad with a replica in fra. We’ve seen these timeouts in both iad and fra.
I’ve pushed a few changes this afternoon that should make sure an error page is quickly shown to the user if something goes wrong with the Redis connection, but it’s not entirely clear what ioredis does if there’s a connection problem (I’m not sure if it times out requests).
I had lots of connection problems initially when using Bull with Node, which also uses ioredis under the hood. This manifested as 502 errors on the calls to a URL within my app that was dependent on Redis behind the scenes. Not always, but once the app had been running a while.
I believe Fly’s proxies disconnect idle connections and I think this was the problem. I could recreate it locally by killing my Redis server whilst my app was running. My solution was to add extra config settings to Bull that pass down into ioredis so that it would automatically re-connect on connection failure.
Might not be same problem you have, but it sounded similar enough to share this. Hope you solve it.
Thanks, @Stephen. I do wonder if that’s the case, but it looks like ioredis reconnects by default, and killing my Redis server locally also kills my app anyhow (so I assume Fly would restart the instance).
@kurt I can’t speak for the others but for me I think it is the latter (i.e. timeouts after connection alive for a while). If this latest instance of it is same as before for me then it works for a while after a deploy (like it does for @thewilkybarkid) and then starts failing after some period (not sure how long, but hours not minutes). It had been fine for me after I tweaked my Redis connection settings about 2 weeks ago, but today I’m getting problems again which I assumed might be the same as the others in this thread.
I understand it might have to reconnect if a connection has been open too long, but it seems like that re-connection might be failing. I’ve just bumped up connect and command timeouts a little and redeployed. It’s working again now so I’ll wait and see what happens.
It’s the latter for me. With half a day’s worth of logs (low traffic though), I can see the connection is closed every so often and ioredis reconnects. Other times, however, the connection remains open but is unusable (i.e. commands time out).
I might add a ping to my app’s health check to see if that helps. (It’ll be called far more often than should be needed, but it would keep it active.)
ioredis might have a ping function. I think the hangs are concerning, but I’m guessing those happen because ioredis hasn’t detected the closed connection. If you can get a command to traverse the connection every ten seconds or so, I think it’ll be fine.
I am also seeing this issue and using ioredis with fly / upstash redis. My ioredis config is dead simple const redis = new Redis(process.env.REDIS_URL). I did add ?family=6 to the end of the connection string printed to my terminal after I created the redis instance in order for the initial connection to work, but now after after a stretch of no traffic I cannot connect anymore.