I’ve been trying for a while to load test websockets (Phoenix Channels) to see how much load my code can handle. The problem is that I stop being able to connect after a certain number of websockets (around 16,000). There are no errors on the server, I just start getting nxdomain
errors locally. I’ve tried a number of things that I’ll describe below, but I just wanted to check if there was some sort of hard-coded limit on the fly.io machines / loadbalancer.
I’ve setup the server locally to run MIX_ENV=prod
and was able to get above 22k connections before I shut it down.
My setup:
- fly.io app which has a Phoenix channel (
shared-cpu-2x
, memory4096
MB) - local app which is connecting using the
slipstream
(which usesmint
) library to create client channel connections. I first make an HTTP request (using HTTPoison) to create a “driver” record, and then I setup the channel for that driver ID. Interestingly these HTTP requests seem to be what always return thenxdomain
errors. - Connecting to the server through Wireguard (HTTP server isn’t exposed to the public internet)
For all of the things below, keep in mind that this is an app for testing a concept and I’m not planning on running it in real production, so I think it should be OK to have these ridiculous limits in this case.
I’ve set high ulimit
values (I got some feedback in this thread on how to set my user). This is my server
script file which runs ulimit
-aS
/ -aH
to verify that the values are set:
#!/bin/bash
ulimit -n 900000
ulimit -i 500000
ulimit -u 500000
ulimit -s 16384
ulimit -aH
ulimit -aS
cd -P -- "$(dirname -- "$0")"
PHX_SERVER=true exec ./my_app start
I have the following in my fly.toml
:
[services.concurrency]
hard_limit = 100000
soft_limit = 100000
[http_service.concurrency]
hard_limit = 100000
soft_limit = 100000
I’ve set high ulimit
values locally for my local application which is connecting with slipstream
:
(base) ➜ my_app git:(main) ✗ ulimit -aH
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 65520
-c: core file size (blocks) unlimited
-v: address space (kbytes) unlimited
-l: locked-in-memory size (kbytes) unlimited
-u: processes 5333
-n: file descriptors unlimited
(base) ➜ my_app git:(main) ✗ ulimit -aS
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 8176
-c: core file size (blocks) 0
-v: address space (kbytes) unlimited
-l: locked-in-memory size (kbytes) unlimited
-u: processes 5333
-n: file descriptors 1000000
I’ve tried both cowboy
and bandit
(currently using bandit
) and both fail around 16k connections (interestingly cowboy
seems to fail a few hundred below 16k and bandit
seems to fail a couple of hurdle above 16k)
I’ve looked at configuration options for cowboy, bandit, slipstream, mint, and Phoenix and tried various things that looked like they might work, but no luck.
In LiveDashboard I don’t see any processes with long message queues…
Before things crash, I generally get up to around 2.4GB out of the 4GB I’ve allocated.
Would love any help