Problem with machine metrics affecting the autoscaling

Hi everyone,

I’m having some issues with my Django Channels WebSocket application scaling configuration. I’ve set up connection-based scaling with a softlimit of 20 and hardlimit of 25 connections. However, the limits don’t seem to be working as expected, and I’m noticing some strange behavior.

The main issues I’m experiencing are:

  1. Connections are being dropped from one machine and redirected to another periodically, even when the limits haven’t been reached
  2. One of the machines appears to shut down unexpectedly
  3. The soft and hard limits don’t seem to be enforced properly

Look at some metrics from grafana:

Here is my fly.toml file:

primary_region = 'gru'
console_command = 'python manage.py shell'

[build]
  dockerfile = "./infra/Dockerfile"

[deploy]
  strategy = "bluegreen"

[[services]]
  internal_port = 8000
  protocol = "tcp"
  auto_stop_machines = "stop"
  auto_start_machines = true
  min_machines_running = 0

  [services.concurrency]
    type = "connections"
    hard_limit = 25
    soft_limit = 20

  [[services.ports]]
    handlers = ["http"]
    port = "80"

  [[services.ports]]
    handlers = ["tls", "http"]
    port = "443"

  [[services.tcp_checks]]
    interval = 10000
    timeout = 2000

[[vm]]
  memory = '2gb'
  cpu_kind = 'shared'
  cpus = 2

[[statics]]
  guest_path = '/src/output/staticfiles'
  url_prefix = '/static/'

I have another monitoring interface and it showed that at this time there was 60 connections with the websocket

Look at this other example:


For instance, there is 12-15 people using the websocket on the last minutes and suddenly the “Concurrency” drops to 0. We know the machine is still running by looking on the other metrics and we also know that Websocket users are connected because we are monitoring via our own monitoring system, so what is going on?

This looks like an issue on Flys end. I don’t whether its on the metrics side or on the auto balancing part, but definetely on the infrastructure management part.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.