Hello Fly.io Community,
We’re encountering an issue with our Redis instance where the total_commands_processed metric is exceptionally high (3.1 million) despite low perceived activity and no active users. This is causing concern regarding costs and performance.
Context & Problem:
-
Observation: The total_commands_processed metric on our Redis machine is abnormally high (3.1 million). We have minimal active users and do not expect such heavy Redis operations.
-
Setup: The application (QReply.ai worker and chat services) connects to this Redis instance. We’ve observed socket_keepalive and socket_timeout configurations in our worker code that might be related.
-
Diagnostic Attempts:
-
We’ve reviewed the socket_keepalive and socket_timeout settings in our worker code (document-processor/worker.py) and made adjustments (e.g., increasing keepalive intervals).
-
The Fly.io “Metrics” section provides limited Redis-specific details. The “Live Logs” initially showed “Waiting for logs…”, hindering direct observation of connection activity.
-
-
Timeline: This issue has been observed since [PLEASE INSERT DATE OR PERIOD WHEN THE ISSUE STARTED, e.g., ‘yesterday evening’ or ‘since the last deployment’ ].
Specific Questions for Fly.io Support / Community:
-
Impact of socket_keepalive: Could the socket_keepalive options in our code (like TCP_KEEPIDLE, TCP_KEEPINTVL, TCP_KEEPCNT) be causing a high total_commands_processed count in Fly.io metrics, even if they are just pings?
-
Fly.io Redis Metrics:
-
Is total_commands_processed the most relevant metric for this issue?
-
How can we access more detailed Redis metrics in Fly.io (e.g., instantaneous_ops_per_sec, connected_clients, used_memory vs. maxmemory)? Specifically, how do we effectively use the “Metrics” section or the Grafana link for deep-dive analysis?
-
Are there default Fly.io configurations for Redis that might influence command counts irrespective of our client settings?
-
Log Analysis: Are there specific log filters or CLI commands (fly logs) that can help us trace Redis communication (pings, connections, errors) more effectively at the network level?
-
High Command Count without Activity: What are the common reasons for an extremely high total_commands_processed on a Redis instance with seemingly low application traffic?
We are seeking to understand if this is a configuration issue on our end, a behavior of Fly.io’s managed Redis, or something else entirely.
Thank you for your assistance!
Best regards, Michiel