Upstash Redis and "could not send HTTP request to instance: connection error: timed out" logs

Our app (‘prereview’) is currently failing on requests that use sessions, which we’ve recently switched to being stored in Upstash Redis.

The requests give a 502 Bad Gateway response, and the logs contain the message ‘could not send HTTP request to instance: connection error: timed out’. I can see in the logs that the request has been sent to the app and it’s processing it; we don’t have any logging around the Redis connection though so I’m wondering if that’s gone (very) slow. I can’t see any useful details on the Upstash dashboard though. (Nor the Fly/Upstash status pages.)

(The logs show issues between 13:10 and 13:36 UTC today, 17 Jan.)

Are there any known issues with Upstash Redis going slow?

Is this issue still happening?

@jsierles Yeah, just seen it again (at 15:27:14).

What runtime/framework/client do you use? In which region are you seeing timeouts? It seems odd you wouldn’t see any logs from your Redis client unless you have a node app, which can sometimes swallow errors.

It’s Node, using Keyv. This uses ioredis under the hood.

The app is deployed in iad and fra; the Redis primary is iad with a replica in fra. We’ve seen these timeouts in both iad and fra.

I’ve pushed a few changes this afternoon that should make sure an error page is quickly shown to the user if something goes wrong with the Redis connection, but it’s not entirely clear what ioredis does if there’s a connection problem (I’m not sure if it times out requests).

I had lots of connection problems initially when using Bull with Node, which also uses ioredis under the hood. This manifested as 502 errors on the calls to a URL within my app that was dependent on Redis behind the scenes. Not always, but once the app had been running a while.

I believe Fly’s proxies disconnect idle connections and I think this was the problem. I could recreate it locally by killing my Redis server whilst my app was running. My solution was to add extra config settings to Bull that pass down into ioredis so that it would automatically re-connect on connection failure.

Might not be same problem you have, but it sounded similar enough to share this. Hope you solve it.

1 Like

Thanks, @Stephen. I do wonder if that’s the case, but it looks like ioredis reconnects by default, and killing my Redis server locally also kills my app anyhow (so I assume Fly would restart the instance).

I’ve made a few changes to improve the end-user experience (well, make it less bad). So the app now serves error pages if Redis commands take longer than 2 seconds, rather than hanging and eventually timing out (prereview.org/index.ts at 49914fe6994bb279116d761603344a16d763bcd9 · PREreview/prereview.org · GitHub).

I’m also now logging the ioredis events, so I can see it connecting/disconnecting/reconnecting etc (prereview.org/index.ts at 49914fe6994bb279116d761603344a16d763bcd9 · PREreview/prereview.org · GitHub).

Hopefully, this should be enough to determine if the issue is the Redis connection.

I believe it is a connection/latency problem. I’ve just had Redis timeouts (set to 2s) when connecting through the replica in fra. A deployment happened, and the fresh connection worked fine.

Not long beforehand, the other app instance in iad successfully restarted its connection according to my app’s logs.

I’m having the same issue. I can’t deploy my applications because of timeouts to redis.

We also see the same issue with Redis using rails/sidekiq

Seems to be happening again (‘prereview’ app, connecting to the instance in fra using the Redis replica in fra).

After thinking I’d solved my similar problem in the past I’m suddenly getting it again now as well.

I’m seeing lots of Redis connection timeout errors and my app isn’t successfully getting anything out of Redis. I haven’t changed anything relevant for quite a while so don’t believe I’ve broken it.

Just to clarify, you’re seeing timeouts when you try to connect to Redis? Or you’re seeing timeouts when a connection has been alive for a while.

It sounds like the former, which shouldn’t be happening. The latter is somewhat normal.

@kurt I can’t speak for the others but for me I think it is the latter (i.e. timeouts after connection alive for a while). If this latest instance of it is same as before for me then it works for a while after a deploy (like it does for @thewilkybarkid) and then starts failing after some period (not sure how long, but hours not minutes). It had been fine for me after I tweaked my Redis connection settings about 2 weeks ago, but today I’m getting problems again which I assumed might be the same as the others in this thread.

I understand it might have to reconnect if a connection has been open too long, but it seems like that re-connection might be failing. I’ve just bumped up connect and command timeouts a little and redeployed. It’s working again now so I’ll wait and see what happens.

It’s the latter for me. With half a day’s worth of logs (low traffic though), I can see the connection is closed every so often and ioredis reconnects. Other times, however, the connection remains open but is unusable (i.e. commands time out).

I might add a ping to my app’s health check to see if that helps. (It’ll be called far more often than should be needed, but it would keep it active.)

ioredis might have a ping function. I think the hangs are concerning, but I’m guessing those happen because ioredis hasn’t detected the closed connection. If you can get a command to traverse the connection every ten seconds or so, I think it’ll be fine.

Can you share your ioredis config?

I am also seeing this issue and using ioredis with fly / upstash redis. My ioredis config is dead simple const redis = new Redis(process.env.REDIS_URL). I did add ?family=6 to the end of the connection string printed to my terminal after I created the redis instance in order for the initial connection to work, but now after after a stretch of no traffic I cannot connect anymore.

We have the same as @joncallahan, except for setting commandTimeout (and enableAutoPipelining, which had no effect on this).

Regarding pinging Redis… see Upstash - Redis not reachable sometimes - #8 by pontus

Looks like it might have worked for them (albeit with node-redis). Hopefully ioredis has something similar. If so, I’ll try it and see.