High Redis Command Count on Machine (Suspected Keepalive/Ping Issue)

brrum · July 3, 2025, 7:22am

Hello Fly.io Community,

We’re encountering an issue with our Redis instance where the total_commands_processed metric is exceptionally high (3.1 million) despite low perceived activity and no active users. This is causing concern regarding costs and performance.

Context & Problem:

Observation: The total_commands_processed metric on our Redis machine is abnormally high (3.1 million). We have minimal active users and do not expect such heavy Redis operations.
Setup: The application (QReply.ai worker and chat services) connects to this Redis instance. We’ve observed socket_keepalive and socket_timeout configurations in our worker code that might be related.
Diagnostic Attempts:
- We’ve reviewed the socket_keepalive and socket_timeout settings in our worker code (document-processor/worker.py) and made adjustments (e.g., increasing keepalive intervals).
- The Fly.io “Metrics” section provides limited Redis-specific details. The “Live Logs” initially showed “Waiting for logs…”, hindering direct observation of connection activity.
Timeline: This issue has been observed since [PLEASE INSERT DATE OR PERIOD WHEN THE ISSUE STARTED, e.g., ‘yesterday evening’ or ‘since the last deployment’ ].

Specific Questions for Fly.io Support / Community:

Impact of socket_keepalive: Could the socket_keepalive options in our code (like TCP_KEEPIDLE, TCP_KEEPINTVL, TCP_KEEPCNT) be causing a high total_commands_processed count in Fly.io metrics, even if they are just pings?
Fly.io Redis Metrics:

Is total_commands_processed the most relevant metric for this issue?
How can we access more detailed Redis metrics in Fly.io (e.g., instantaneous_ops_per_sec, connected_clients, used_memory vs. maxmemory)? Specifically, how do we effectively use the “Metrics” section or the Grafana link for deep-dive analysis?
Are there default Fly.io configurations for Redis that might influence command counts irrespective of our client settings?

Log Analysis: Are there specific log filters or CLI commands (fly logs) that can help us trace Redis communication (pings, connections, errors) more effectively at the network level?
High Command Count without Activity: What are the common reasons for an extremely high total_commands_processed on a Redis instance with seemingly low application traffic?

We are seeking to understand if this is a configuration issue on our end, a behavior of Fly.io’s managed Redis, or something else entirely.

Thank you for your assistance!

Best regards, Michiel

Topic		Replies	Views
HTTP response times spike every 5 minutes Questions / Help machines	7	34	March 10, 2025
Shared redis instances down?	1	799	October 28, 2022
Upstash Redis Latency/Timeouts Questions / Help redis	5	118	October 20, 2024
New reference doc for Redis	6	711	May 5, 2022
High latency using Upstash Redis	8	1164	September 21, 2022

High Redis Command Count on Machine (Suspected Keepalive/Ping Issue)

Related topics