Architecture question hosting and scaling webrtc rooms/bots

senapp · February 27, 2025, 3:29pm

Hey guys,

I’m trying to setup my architecture to be scalable for an iOS app launch. The diagram attached shows the basics.

Currently I’m spawning a new machine when the user hits connect from the client to host each “room” which contains a bot and a user, its essentially like a zoom call. These use about 20% CPU utilization on a 1gb machine. It’s set to auto destroy on disconnect from the room, and I have it set to min_machines_running=1.

My question is:

What is the best way to ensure that the user is always connecting to a “started” machine to avoid cold starts and long waits?
How to ensure 1:1 where each user gets its own VM, (there should never be two users) ?
How to make sure the architecture scales up and down appropriately.

(I created a complex warmpool yesterday in cursor but it got stuck in a loop and spawn like 50 machines )

Thanks

halfer · February 27, 2025, 8:24pm

I have something similar. Here’s my arch: How to prevent contention between background jobs in a multi-machine set-up?

I am building an API in my middle app, Distributor. This receives requests from Web that write to the database. I also have several Supervisord processes in the background that watch for database changes using simple loop polling:

If a status is requested and the record has not been processed, clone a template (stopped) machine and start it, and change the status to starting
If a status is starting and the target machine can receive requests, send the request and change the status to started
If there are too many machines running, add a minute to the request start time and change the status to delayed
There’s a process to put delayed requests back to requested if the running machine count drops
Every time a request is rejected it bumps up a retry count, and there’s a process to permanently reject requests that are rejected too many times (e.g. invalid request)

This may seem like a lot of work, but each process is only a bit of SQL run against a managed database, and a bit of language logic to move records between different statuses. Some queries have to be protected against race conditions, bearing in mind that this process will have redundant copies.

Your situation will be a little different in that rather than starting machines on demand, you have a pool of running ones. That’s just another process in a process manager to start extra machines over the number that are occupied with real users. You might also have something to destroy machines that no longer have a user, so that the free pool is kept at a consistent number.

khuezy · February 27, 2025, 9:00pm

Try using suspend. Even with a new machine it should take up to a couple seconds max. Not a terrible UX if the user is connecting to a live chat service.

Do you really need 1:1? Seems like a bunch of wasted resources if it’s only taking 20% (includes overhead.) You can probably have 2-3 more users in there.

If your ws server is scaled up, you have to handle the scenario where a request gets routed to a different machine since there’s no sticky session… you need to look into fly-replay but some users have reported that doesn’t work (fly or user skill issue), so you mileage may vary.

system · March 6, 2025, 9:00pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Autoscaling on CPU utilization? Questions / Help	15	2000	May 18, 2023
Auto-scaling based on response time?	5	898	September 18, 2021
Advanced concurrency, scaling & load balancing Questions / Help machines	8	92	January 17, 2025
Random machine stoppage Questions / Help machines , streams , autoscaling , proxy	2	37	November 22, 2024
Question About Min/Max Scale Count On V2 Apps Questions / Help autoscaling	2	678	May 17, 2023

Architecture question hosting and scaling webrtc rooms/bots

Related topics