High Redis Command Count on Machine (Suspected Keepalive/Ping Issue)

Hello Fly.io Community,

We’re encountering an issue with our Redis instance where the total_commands_processed metric is exceptionally high (3.1 million) despite low perceived activity and no active users. This is causing concern regarding costs and performance.

Context & Problem:

  • Observation: The total_commands_processed metric on our Redis machine is abnormally high (3.1 million). We have minimal active users and do not expect such heavy Redis operations.

  • Setup: The application (QReply.ai worker and chat services) connects to this Redis instance. We’ve observed socket_keepalive and socket_timeout configurations in our worker code that might be related.

  • Diagnostic Attempts:

    • We’ve reviewed the socket_keepalive and socket_timeout settings in our worker code (document-processor/worker.py) and made adjustments (e.g., increasing keepalive intervals).

    • The Fly.io “Metrics” section provides limited Redis-specific details. The “Live Logs” initially showed “Waiting for logs…”, hindering direct observation of connection activity.

  • Timeline: This issue has been observed since [PLEASE INSERT DATE OR PERIOD WHEN THE ISSUE STARTED, e.g., ‘yesterday evening’ or ‘since the last deployment’ ].

Specific Questions for Fly.io Support / Community:

  1. Impact of socket_keepalive: Could the socket_keepalive options in our code (like TCP_KEEPIDLE, TCP_KEEPINTVL, TCP_KEEPCNT) be causing a high total_commands_processed count in Fly.io metrics, even if they are just pings?

  2. Fly.io Redis Metrics:

  • Is total_commands_processed the most relevant metric for this issue?

  • How can we access more detailed Redis metrics in Fly.io (e.g., instantaneous_ops_per_sec, connected_clients, used_memory vs. maxmemory)? Specifically, how do we effectively use the “Metrics” section or the Grafana link for deep-dive analysis?

  • Are there default Fly.io configurations for Redis that might influence command counts irrespective of our client settings?

  1. Log Analysis: Are there specific log filters or CLI commands (fly logs) that can help us trace Redis communication (pings, connections, errors) more effectively at the network level?

  2. High Command Count without Activity: What are the common reasons for an extremely high total_commands_processed on a Redis instance with seemingly low application traffic?

We are seeking to understand if this is a configuration issue on our end, a behavior of Fly.io’s managed Redis, or something else entirely.

Thank you for your assistance!

Best regards, Michiel