Websocket scaling configuration issues

Hi everyone,

I’m having some issues with my Django Channels WebSocket application scaling configuration. I’ve set up connection-based scaling with a softlimit of 20 and hardlimit of 25 connections. However, the limits don’t seem to be working as expected, and I’m noticing some strange behavior.

The main issues I’m experiencing are:

  1. Connections are being dropped from one machine and redirected to another periodically, even when the limits haven’t been reached
  2. One of the machines appears to shut down unexpectedly
  3. The soft and hard limits don’t seem to be enforced properly

Look at some metrics from grafana:

Here is my fly.toml file:

primary_region = 'gru'
console_command = 'python manage.py shell'

[build]
  dockerfile = "./infra/Dockerfile"

[deploy]
  strategy = "bluegreen"

[[services]]
  internal_port = 8000
  protocol = "tcp"
  auto_stop_machines = "stop"
  auto_start_machines = true
  min_machines_running = 0

  [services.concurrency]
    type = "connections"
    hard_limit = 25
    soft_limit = 20

  [[services.ports]]
    handlers = ["http"]
    port = "80"

  [[services.ports]]
    handlers = ["tls", "http"]
    port = "443"

  [[services.tcp_checks]]
    interval = 10000
    timeout = 2000

[[vm]]
  memory = '2gb'
  cpu_kind = 'shared'
  cpus = 2

[[statics]]
  guest_path = '/src/output/staticfiles'
  url_prefix = '/static/'

I have another monitoring interface and it showed that at this time there was 60 connections with the websocket

Just adding more information on the case. For instance, there is 12-15 people using the websocket on the last minutes and suddenly the “Concurrency” drops to 0. We know the machine is still running by looking on the other metrics and we also know that Websocket users are connected because we are monitoring via our own monitoring system, so what is going on?