Different numbers shown on the logs compared to the instance

i am getting this warning in the logs. after following the recommendations i reduced the concurrency to 5 but it looks like the region is set to bog instead of fra that i requested when i created the redis instance although in the dashboard it appears correctly as fra

when i re-deploy the worker region changed again to bos

2023-04-01T16:13:10.354 app[2b49abc0] bog [info] 2023-04-01T16:13:10.353Z pid=520 tid=t6o WARN: Your Redis network connection is performing extremely poorly.
2023-04-01T16:13:10.354 app[2b49abc0] bog [info] Last RTT readings were [169497, 169902, 183293, 168984, 166387], ideally these should be < 1000.
2023-04-01T16:13:10.354 app[2b49abc0] bog [info] Ensure Redis is running in the same AZ or datacenter as Sidekiq.
2023-04-01T16:13:10.354 app[2b49abc0] bog [info] If these values are close to 100,000, that means your Sidekiq process may be
2023-04-01T16:13:10.354 app[2b49abc0] bog [info] CPU-saturated; reduce your concurrency and/or see https://github.com/mperham/sidekiq/discussions/5039 

Screenshot from 2023-04-01 19-23-27

i think there is discrepancy between the figures i see in the upstash dashboard compared to sidekiq ones

Screenshot from 2023-04-01 19-18-35

What numbers are striking you as discrepancies? Or are you specifically concerned about some part of those numbers? If you’re referring to the request count, I think regardless of number of jobs going through sidekiq, it can be rather chatty with redis.

For the app you’re seeing show up in bos and bog could you paste the output of fly regions list?

my main concern here is the warning in the logs about Your Redis network connection is performing extremely poorly.
so my question is, is it something on my end, am i doing something wrong or is there something wrong with that redis instance?

what strikes me as odd is that the worker process appears in a different region that the app process running thesenalador app. Also the worker process region keeps changing everytime i deploy

the redis app is deployed in the same region as the main app fra

output from fly regions list --app senalador

Region Pool: 
fra
Backup Region: 

perhaps the figures in the upstash dashboard are referring to different metrics compared to the sidekiq dashboard like bandwith which i interpreted as memory usage

It seems to me that the main issue is that one or some of your processes are not getting placed in the same region as your redis instance. Let’s try to fix that!

Oh! I didn’t realize you had a web and worker process. Could you share your fly.toml?

You’re right that does seem odd!

Do you have autoscaling enabled or anything?

Those metrics don’t seem totally contrary to me, though they take different angle.

I read that first as a month’s collection of request count, total bandwidth over the month, and key-value storage size.

The screenshot at the bottom from sidekiq admin looks to be literally RAM usage but it’s unclear to me whether that’s a metric it pulls from redis, a note of sidekiq’s RAM usage for communicating with redis, or what exactly! Its also unclear if that is its current RAM usage or if it’s usage over a span of time. I would guess it’s “Redis-reported current RAM usage.”

here is the toml file

# fly.toml file generated for senalador on 2023-01-28T07:09:42+02:00

app = "senalador"
kill_signal = "SIGINT"
kill_timeout = 5
[processes]
web = "bin/rails fly:server"
worker = "bundle exec sidekiq"

[build]
  [build.args]
    BUILD_COMMAND = "bin/rails fly:build"
    SERVER_COMMAND = "bin/rails fly:server"

[deploy]
  release_command = "bin/rails fly:release"

[env]
  PORT = "8080"

[experimental]
  auto_rollback = true

[mounts]
  source="senalador_thumbs"
  destination="/app/public/images/thumbs"
  processes = ["web"]

[[services]]
  http_checks = []
  internal_port = 8080
  processes = ["web"]
  protocol = "tcp"
  script_checks = []
  [services.concurrency]
    hard_limit = 100
    soft_limit = 80
    type = "connections"

  [[services.ports]]
    force_https = true
    handlers = ["http"]
    port = 80

  [[services.ports]]
    handlers = ["tls", "http"]
    port = 443

  [[services.tcp_checks]]
    grace_period = "1s"
    interval = "15s"
    restart_limit = 0
    timeout = "2s"

[[statics]]
  guest_path = "/app/public"
  url_prefix = "/"

exactly! the worker process is not in the same region as the web process and redis instance which in my case is fra
no autoscaling as far as i know :slight_smile:
i think that current RAM usage from sidekiq jobs

So using a volume will pin your web process near the volume you have set up (presumably fra).

I wonder if you should try setting region for the worker process group specifically, instead of relying on what you had set for the app.

Could you try running fly regions set fra --group worker? You might have to do a deploy so it gets scheduled into the region you’ve specified.

yes i had too because i was an error Error not enough volumes named senalador_thumbs (1) to run 2 processes

fly regions set fra --group worker

Error You must have a paid plan to scale up in fra

Oh your volume situation makes sense! I think that’s the main reason your web process is “pinned” to fra.

That error could be a clearer. We’ll get that fixed up. Anyways, that error popped up because fra is one of our highest demand regions.

In the interim you could try getting the worker a little closer at least; you could try cdg or ams. Latency between those and fra is pretty good. We could at least see if your connection to redis is more reliable.

hey jon
i changed the region for the worker process to ams and i think the warning is gone now :tada:

however the selected region is not shown in the allocations as you can see below :thinking:

:raised_hands: Glad we were able to improve that.

It looks like it was placed in cdg (France). Odd!

Since this is a v1 app, it’s scheduled by Nomad. This might be a quirk with how Nomad is scheduling that, but the latency should still be better than what you were getting and more predictable since it should be landing closer to your redis instance going forward.

i am afraid is the warning is back back after a few jobs ran :frowning:

should i move the redis to the same region as the worker?

i have a job that creates 50 jobs that visit a url and retrieve the status code. lets assume they do indeed saturate the CPU. why do i keep getting the same warning after the jobs finished? shouldn’t they the resources?

That latency one, the CPU one, or both? :smiley:

You could try to get the worker, web, and redis all in one region, but first I wanted to add some notes about the CPU portion since I think those are two separate issues.

To address the CPU issue specifically, it might be worth exploring some other pieces. Do you know what concurrency you’re running for sidekiq? If you’re just using the default it looks like it does 10 threads per worker - you could consider lowering that to 5 to see if it’s more stable.

I would probably try to wrangle this with concurrency limits initially, myself, but I should mention another good option would be to scale the worker VM up a bit (available sizes):

fly scale vm dedicated-cpu-1x --group worker

i think the problem here is that the worker process does not stay in the region that was assigned to. right now it is in bog no wonder the RTT, which i assume is what you are referring to as latency, is really high which is causing the CPU to become saturated.

a few questions that come to mind

  1. what would a saturated CPU graph look like? does it have spikes?
  2. is there a dashboard in fly to understand the performance of the CPU?
    after looking at the grafana charts i can’t spot anything that stands out

concurrency is already set to 5

i don’t think scaling up is going to solve the problem above.
unless it is the other way around the worker CPU is keeping the redis connection open for too long in which case it is something i could ignore assuming it doesn’t inflate my bills.

nevertheless i would definitely like to know why the worker process doesn’t sit in the assigned region

RTT: round trip time - yep I was equating that to latency. The VM moving around definitely seems related.

What does your CPU utilization metric look like when you’re seeing those logs?

Also, could you share the output of fly scale show? If you’re running something like micros you might want to bump up to shared-1x since there’s a bit more compute power there.


While I was writing this up I was looking more into the issue with placing your worker VM. I discovered some logic that affects region placement for multi-process apps with volumes specifically. Basically, if you have volumes we allow our volume driver to schedule VMs near your volumes. The other region logic seems to get skipped and we only rely on the volumes.

The problem with this setup you have is that you have a process with a volume and one without volume - so we don’t pin it to a region near a volume.

So there are a couple options for this.

The first option; you could create another volume and set both processes to mount volumes so it winds up pinned to a region. This is probably sub-optimal for a couple reasons; you have potentially one volume not being used, your web volume might sometimes flip to the other volume - and depending on your logic that might be a problem.

The second option - this is probably simplest - would be to create a worker app; just yourapp-worker and deploy it as a separate app that just doesn’t setup web services. This way you can use regions the normal way. If you need volumes you have the flexibility to add them when you need them on the worker etc. I do this on my own projects actually.

A third wilder option: our new version of Apps. I think you can skirt some of your issues by moving to our newer Apps platform which moves away from a lot of these weird Nomad scheduling issues that you’re experiencing right now and to our own platform. There’s a lot of active work going into these, and migration is pretty straightforward.

One complexity with migrating your app is that we haven’t fully automated the migration with volumes so if you’re interested in this option we can talk about how to deal with your volume to switch over. There’s some more information here: Sneak peek: docs for Apps V2 (help us improve them!) and in the links within if you’re curious about that.

Main docs are up to date! Actually, they’re more up to date than the preview. I’ll take down that preview app.

1 Like

this is output for fly scale show

VM Resources for senalador
          Count: web=1 worker=1 
 Max Per Region: app=0 web=0 worker=0 

Process group app
        VM Size: shared-cpu-1x
      VM Memory: 256 MB
 Max Per Region: 0

Process group web
        VM Size: shared-cpu-1x
      VM Memory: 256 MB
 Max Per Region: 0

Process group worker
        VM Size: shared-cpu-1x
      VM Memory: 256 MB
 Max Per Region: 0

i don’t see major any spikes in CPU usage the times the jobs run and the billing is quite flat as well so perhaps there is nothing to worry about at this point in time
i guess i will have to migrate to V2 eventually :slight_smile:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.