Hi everyone,
I’m having some issues with my Django Channels WebSocket application scaling configuration. I’ve set up connection-based scaling with a softlimit of 20 and hardlimit of 25 connections. However, the limits don’t seem to be working as expected, and I’m noticing some strange behavior.
The main issues I’m experiencing are:
- Connections are being dropped from one machine and redirected to another periodically, even when the limits haven’t been reached
- One of the machines appears to shut down unexpectedly
- The soft and hard limits don’t seem to be enforced properly
Look at some metrics from grafana:
Here is my fly.toml file:
primary_region = 'gru'
console_command = 'python manage.py shell'
[build]
dockerfile = "./infra/Dockerfile"
[deploy]
strategy = "bluegreen"
[[services]]
internal_port = 8000
protocol = "tcp"
auto_stop_machines = "stop"
auto_start_machines = true
min_machines_running = 0
[services.concurrency]
type = "connections"
hard_limit = 25
soft_limit = 20
[[services.ports]]
handlers = ["http"]
port = "80"
[[services.ports]]
handlers = ["tls", "http"]
port = "443"
[[services.tcp_checks]]
interval = 10000
timeout = 2000
[[vm]]
memory = '2gb'
cpu_kind = 'shared'
cpus = 2
[[statics]]
guest_path = '/src/output/staticfiles'
url_prefix = '/static/'
I have another monitoring interface and it showed that at this time there was 60 connections with the websocket
Look at this other example:
For instance, there is 12-15 people using the websocket on the last minutes and suddenly the “Concurrency” drops to 0. We know the machine is still running by looking on the other metrics and we also know that Websocket users are connected because we are monitoring via our own monitoring system, so what is going on?
This looks like an issue on Flys end. I don’t whether its on the metrics side or on the auto balancing part, but definetely on the infrastructure management part.