Problem with machine metrics affecting the autoscaling

giulianopenido · November 25, 2024, 8:35pm

Hi everyone,

I’m having some issues with my Django Channels WebSocket application scaling configuration. I’ve set up connection-based scaling with a softlimit of 20 and hardlimit of 25 connections. However, the limits don’t seem to be working as expected, and I’m noticing some strange behavior.

The main issues I’m experiencing are:

Connections are being dropped from one machine and redirected to another periodically, even when the limits haven’t been reached
One of the machines appears to shut down unexpectedly
The soft and hard limits don’t seem to be enforced properly

Look at some metrics from grafana:

Here is my fly.toml file:

primary_region = 'gru'
console_command = 'python manage.py shell'

[build]
  dockerfile = "./infra/Dockerfile"

[deploy]
  strategy = "bluegreen"

[[services]]
  internal_port = 8000
  protocol = "tcp"
  auto_stop_machines = "stop"
  auto_start_machines = true
  min_machines_running = 0

  [services.concurrency]
    type = "connections"
    hard_limit = 25
    soft_limit = 20

  [[services.ports]]
    handlers = ["http"]
    port = "80"

  [[services.ports]]
    handlers = ["tls", "http"]
    port = "443"

  [[services.tcp_checks]]
    interval = 10000
    timeout = 2000

[[vm]]
  memory = '2gb'
  cpu_kind = 'shared'
  cpus = 2

[[statics]]
  guest_path = '/src/output/staticfiles'
  url_prefix = '/static/'

I have another monitoring interface and it showed that at this time there was 60 connections with the websocket

Look at this other example:

For instance, there is 12-15 people using the websocket on the last minutes and suddenly the “Concurrency” drops to 0. We know the machine is still running by looking on the other metrics and we also know that Websocket users are connected because we are monitoring via our own monitoring system, so what is going on?

This looks like an issue on Flys end. I don’t whether its on the metrics side or on the auto balancing part, but definetely on the infrastructure management part.

system · December 2, 2024, 8:51pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Websocket scaling configuration issues Django metrics , machines , streams , autoscaling	3	74	December 2, 2024
Maximum amount of connections to a single VM Questions / Help metrics , streams , autoscaling , proxy	4	79	February 2, 2025
Warning about hard limit on websocket app	5	957	May 26, 2023
Autoscaling is not triggered on a pure websocket application Questions / Help elixir	23	1835	November 4, 2022
Auto-downscaling closes live websocket connections Questions / Help streams , autoscaling , proxy	4	322	August 17, 2023

Problem with machine metrics affecting the autoscaling

Related topics