Node’s network errors aren’t always very helpful. That says it’s a DNS lookup error, but it could be a network problem connecting, or a Redis that’s not responding. Are you seeing these a lot?
It looks like that Redis instance is running in Frankfurt and that app process is running in Virginia. If you only get errors like that occasionally, it’s worth implementing connection retries (usually a driver setting). If you’re getting them repeatedly, there might be another issue to debug.
Incidentally, it’s probably more reliable to connect to there-redis2.internal from another Fly app and bypass our load balancer.
I had tried to use it via internal networking, but it couldn’t connect at that time because of changing orgs.
So today I immediately replaced the Redis Pub/sub with a NATS cluster on 6PN, and hopefully it won’t have such issues anymore.
On that error, I wasn’t seeing them a lot but a couple of times in a day it seemed to be. Also it was being thrown from all my Node.js nodes in FRA, and IAD, so probably a thing with the load balancer for Redis. It probably wasn’t Redis not responding, because it had 1 GB of RAM, low traffic and I saw no errors in its logs.
Oh well NATS messaging is going to work a lot better across regions so that’s a good change!
A couple of those errors per day when working through our load balancer are probably “normal”. The Node Redis driver is pretty brittle by default, if you end up using it again we can help figure out retries.
I assume ioredis package does retries, however I had hooked listeners on every error, as these are WebRTC signalling messages and user state updates, so I wanted to minimize and monitor any loss.
I still use Redis for storing some cross-region state, so will switch it to 6PN (hope it works this time) and I’m open for any suggestions.