upstash redis timeouts

I’m continuing to work on a potential migration of our app from Heroku. I’m now stuck trying to get the upstash redis offering to work with my Rails app; I am planning to use two redis instances, one for the rails cache and one for my sidekiq queue.

I have provisioned two different upstash instances (just one of each; no regional replicas) with fly redis create (one free, one at the lowest paid tier, everything in ewr) and whenever my app tries to connect to either redis instance it times out.

The secrets for the relevant environment variables are set correctly. I can connect to redis using redis cli locally using fly redis connect (it takes around 6s for a prompt to appear) for both instances and it seems responsive once I’m connected.

When I ssh console and run the rails console and try to connect with the Ruby client, I also get timeouts (Redis::TimeoutError (Connection timed out)) when the client attempts to connect. I’ve tried using both the server domain in the pattern of redis://default:PASS@fly-NAME.upstash.io as output by fly redis create; and was also able to discover the private ipv6 IPs* by watching the fly redis connect dialogue and tried using the IP address instead of the upstash domain name with identical results.

(I also tried using the .internal:6379 domain name instead, which failed immediately with a “can’t connect” error (guessing this is just an incorrect domain for this use case).)

I can’t figure out if I am doing something wrong or if upstash is really that slow to establish a connection. If so, I guess my next best option is rolling my own redis?

* - Currently, the link to on the fly redis reference doc to private IPv6 address is broken, FWIW. The only way I could find the internal ipv6 IPs was the output of fly redis connect

Connecting with fly redis connect means the database itself is OK. Is the Rails app in the same organization as the Redis database? They have to be in the same org.

Yes, I just confirmed the upstash redis instances and the app are all in the same organization and all in ewr.

I also tried setting a much higher timeout value in the client (10s) and get a ECONNRESET before the timeout. Here is the console session (token & instance name obscured):

irb(main):005:0> redis_client = Redis.new(url: 'redis://default:TOKEN@fly-NAME.upstash.io', timeout: 10)
=> #<Redis client v4.8.0 for redis://fly-NAME.upstash.io:6379/0>
irb(main):006:0> redis_client.set("testkey","foo")
Traceback (most recent call last):
        1: from (irb):6
Redis::ConnectionError (Connection lost (ECONNRESET))
irb(main):007:0>

The only other thing I can think to test is I am using the latest 4.x branch of the ruby redis gem due to some compatibility issues I ran into awhile ago and haven’t upgraded or tested the 5.x branch, but I don’t see 5.x as a requirement anywhere in the docs.

Can you try connecting with redis-cli -u <url> on the VM?

The VM doesn’t have redis-cli installed out of the box. I will come back around to this when I’m out of meetings later today. I’m not sure how to get deploy to install redis-cli (I am not a Docker whiz).

If it’s running debian/ubuntu, you should be able to run apt-get update && apt-get install redis-tools.

I noticed that redis-rb only adds Ipv6 URL support in v5 and up. But that appears to only affect cases where the IP is used as the hostname. Might be worth a try too though.

I am also getting a server disconnect running freshly installed redis-cli:

# redis-cli -u redis://default:TOKEN@fly-NAME.upstash.io
fly-NAME.upstash.io:6379> SET "testkey" "foo"
Error: Server closed the connection
(8.10s)
fly-NAME.upstash.io:6379>

Version of client is 5.0.14

# redis-cli -v
redis-cli 5.0.14

I am connecting with the URL provided by fly redis status:

 $ fly redis status NAME
Redis
  ID             = ID
  Name           = NAME
  Plan           = Free
  Primary Region = ewr
  Read Regions   =
  Private URL    = redis://default:TOKEN@fly-NAME.upstash.io

And I have confirmed the organization is the same for these redis instances as for my app server and they are in the same region (ewr)…

FWIW I just stepped through the process using only free services (free app instance, free pg db, free upstash redis) with a new rails app at the same ruby/rails versions (2.7.6/6.1.7) I am running inside a newly created “organization” to see if this was a characteristic of the app instances created when fly launch detects a rails app … and I am not replicating the redis behavior.

For the organization/app I am experiencing this problem with I am using upgraded VMs for the app and the pg instance (and as mentioned, two different redis instances, but they both exhibit the same behavior).

I am going to try once again starting from scratch with a migration, this time deleting the organization and starting over and will report back to see if I can reproduce this or see if I can catch the moment where I begin running into redis issues.

I am able to replicate the timeout/disconnect behavior if I deploy my app on upgraded app instance, upgraded pg instance and two different upstash redis instances. Not sure which one of those things is the culprit. Going to next try with only one upstash redis instance (which isn’t ideal for my use case but might be the cause).

Confirmed. If I delete the second upstash redis instance and use a single redis db I do not experience the timeouts and the single upstash redis instance is responsive.

Ideally, I’d like to run two instances, one with a key eviction policy for my cache and one with no key eviction for my background worker jobs.

Indeed, I can reproduce as well. Good catch! We’ll look into it.

Hey, just saw your tweet. Can you try this again? This issue was fixed a few days ago, In the end, it had nothing to do with running multiple Redis instances per org. It was a problem on the provider’s side that was more noticeable when provisioning in the same region.

I am traveling this week (got trapped in Florida thanks to the hurricane!) so will be able to do a dry-run migration and provisioning next week. Glad to hear the underlying issue was uncovered and (hopefully) fixed!

Don’t know if directly related but I’m seeing a ton of Redis related errors:

Here’s the full stacktrace of Redis::ConnectionError one which is the most common:

Errno::ECONNRESET: Connection reset by peer
  from redis-client (0.11.1) lib/redis_client/ruby_connection/buffered_io.rb:145:in `block in fill_buffer'
  from redis-client (0.11.1) lib/redis_client/ruby_connection/buffered_io.rb:122:in `loop'
  from redis-client (0.11.1) lib/redis_client/ruby_connection/buffered_io.rb:122:in `fill_buffer'
  from redis-client (0.11.1) lib/redis_client/ruby_connection/buffered_io.rb:114:in `ensure_remaining'
  from redis-client (0.11.1) lib/redis_client/ruby_connection/buffered_io.rb:85:in `getbyte'
  from redis-client (0.11.1) lib/redis_client/ruby_connection/resp3.rb:113:in `parse'
  from redis-client (0.11.1) lib/redis_client/ruby_connection/resp3.rb:50:in `load'
  from redis-client (0.11.1) lib/redis_client/ruby_connection.rb:125:in `block in read'
  from redis-client (0.11.1) lib/redis_client/ruby_connection/buffered_io.rb:48:in `with_timeout'
  from redis-client (0.11.1) lib/redis_client/ruby_connection.rb:125:in `read'
  from redis-client (0.11.1) lib/redis_client/connection_mixin.rb:18:in `call'
  from redis-client (0.11.1) lib/redis_client.rb:281:in `block (2 levels) in blocking_call_v'
  from redis-client (0.11.1) lib/redis_client/middlewares.rb:16:in `call'
  from redis-client (0.11.1) lib/redis_client.rb:280:in `block in blocking_call_v'
  from redis-client (0.11.1) lib/redis_client.rb:616:in `ensure_connected'
  from redis-client (0.11.1) lib/redis_client.rb:279:in `blocking_call_v'
  from redis (5.0.5) lib/redis/client.rb:86:in `blocking_call_v'
  from redis (5.0.5) lib/redis.rb:173:in `block in send_blocking_command'
  from redis (5.0.5) lib/redis.rb:172:in `synchronize'
  from redis (5.0.5) lib/redis.rb:172:in `send_blocking_command'
  from redis (5.0.5) lib/redis/commands/lists.rb:266:in `_bpop'
  from redis (5.0.5) lib/redis/commands/lists.rb:167:in `brpop'
  from sidekiq (6.5.5) lib/sidekiq/fetch.rb:49:in `block in retrieve_work'
  from sidekiq (6.5.5) lib/sidekiq.rb:164:in `block in redis'
  from connection_pool (2.3.0) lib/connection_pool.rb:65:in `block (2 levels) in with'
  from connection_pool (2.3.0) lib/connection_pool.rb:64:in `handle_interrupt'
  from connection_pool (2.3.0) lib/connection_pool.rb:64:in `block in with'
  from connection_pool (2.3.0) lib/connection_pool.rb:61:in `handle_interrupt'
  from connection_pool (2.3.0) lib/connection_pool.rb:61:in `with'
  from sidekiq (6.5.5) lib/sidekiq.rb:161:in `redis'
  from sidekiq (6.5.5) lib/sidekiq/component.rb:26:in `redis'
  from sidekiq (6.5.5) lib/sidekiq/fetch.rb:49:in `retrieve_work'
  from sidekiq (6.5.5) lib/sidekiq/processor.rb:83:in `get_one'
  from sidekiq (6.5.5) lib/sidekiq/processor.rb:95:in `fetch'
  from sidekiq (6.5.5) lib/sidekiq/processor.rb:77:in `process_one'
  from sidekiq (6.5.5) lib/sidekiq/processor.rb:68:in `run'
  from sidekiq (6.5.5) lib/sidekiq/component.rb:8:in `watchdog'
  from sidekiq (6.5.5) lib/sidekiq/component.rb:17:in `block in safe_thread'
RedisClient::ConnectionError: Connection reset by peer
  from redis-client (0.11.1) lib/redis_client/ruby_connection.rb:130:in `rescue in read'
  from redis-client (0.11.1) lib/redis_client/ruby_connection.rb:121:in `read'
  from redis-client (0.11.1) lib/redis_client/connection_mixin.rb:18:in `call'
  from redis-client (0.11.1) lib/redis_client.rb:281:in `block (2 levels) in blocking_call_v'
  from redis-client (0.11.1) lib/redis_client/middlewares.rb:16:in `call'
  from redis-client (0.11.1) lib/redis_client.rb:280:in `block in blocking_call_v'
  from redis-client (0.11.1) lib/redis_client.rb:616:in `ensure_connected'
  from redis-client (0.11.1) lib/redis_client.rb:279:in `blocking_call_v'
  from redis (5.0.5) lib/redis/client.rb:86:in `blocking_call_v'
  from redis (5.0.5) lib/redis.rb:173:in `block in send_blocking_command'
  from redis (5.0.5) lib/redis.rb:172:in `synchronize'
  from redis (5.0.5) lib/redis.rb:172:in `send_blocking_command'
  from redis (5.0.5) lib/redis/commands/lists.rb:266:in `_bpop'
  from redis (5.0.5) lib/redis/commands/lists.rb:167:in `brpop'
  from sidekiq (6.5.5) lib/sidekiq/fetch.rb:49:in `block in retrieve_work'
  from sidekiq (6.5.5) lib/sidekiq.rb:164:in `block in redis'
  from connection_pool (2.3.0) lib/connection_pool.rb:65:in `block (2 levels) in with'
  from connection_pool (2.3.0) lib/connection_pool.rb:64:in `handle_interrupt'
  from connection_pool (2.3.0) lib/connection_pool.rb:64:in `block in with'
  from connection_pool (2.3.0) lib/connection_pool.rb:61:in `handle_interrupt'
  from connection_pool (2.3.0) lib/connection_pool.rb:61:in `with'
  from sidekiq (6.5.5) lib/sidekiq.rb:161:in `redis'
  from sidekiq (6.5.5) lib/sidekiq/component.rb:26:in `redis'
  from sidekiq (6.5.5) lib/sidekiq/fetch.rb:49:in `retrieve_work'
  from sidekiq (6.5.5) lib/sidekiq/processor.rb:83:in `get_one'
  from sidekiq (6.5.5) lib/sidekiq/processor.rb:95:in `fetch'
  from sidekiq (6.5.5) lib/sidekiq/processor.rb:77:in `process_one'
  from sidekiq (6.5.5) lib/sidekiq/processor.rb:68:in `run'
  from sidekiq (6.5.5) lib/sidekiq/component.rb:8:in `watchdog'
  from sidekiq (6.5.5) lib/sidekiq/component.rb:17:in `block in safe_thread'

I have Sidekiq concurrency set to 2 and it is processing everything eventually it seems. But there are failures in the app code as well, and that’s not great :grimacing:

I’m on the free redis plan with redis 5.0.5 and redis-client 0.11.1 gems. My code is open source if there’s anything of interest.

Yeah, this is not good :see_no_evil:

Any suggestions what I can try? The app code works flawlessly locally and on Heroku.

Can you try fly redis connect? That might help us debug what’s going on here. This is most likely related to the fact that idle connections are timed out. More info here:

Also, can you post a stack trace for one of the ShotsController actions? The reset by peer are normal when there’s a connection pool with idle connections - pretty normal on Sidekiq. But the read timeouts are what are worth looking into.

It connected and most of the time it worked, then it suddenly didn’t if I remember correctly. I have since switched to self hosted redis and have had 0 problems since.

As for the stacktrace of the ShotsController - there were several:

https://sentry.io/share/issue/34f99d11e3cc4e9785a178e80f20e86b/
https://sentry.io/share/issue/8570bfac93ec4d18b8711c9d3316e18d/
https://sentry.io/share/issue/db96d05249164b99a6c16086e1e5fe28/

But all seemingly with the same underlying issue.

Anyway, as said, I’m now self hosting it and it’s working great. Maybe that should be the default suggestion for Rails apps?

I’m glad that’s working for you, but I’m not sure what the default should be. Having redis as a separate application would scale better if/when you need multiple vms.