Architecture question hosting and scaling webrtc rooms/bots

Hey guys,

I’m trying to setup my architecture to be scalable for an iOS app launch. The diagram attached shows the basics.

Currently I’m spawning a new machine when the user hits connect from the client to host each “room” which contains a bot and a user, its essentially like a zoom call. These use about 20% CPU utilization on a 1gb machine. It’s set to auto destroy on disconnect from the room, and I have it set to min_machines_running=1.

My question is:

  1. What is the best way to ensure that the user is always connecting to a “started” machine to avoid cold starts and long waits?

  2. How to ensure 1:1 where each user gets its own VM, (there should never be two users) ?

  3. How to make sure the architecture scales up and down appropriately.

(I created a complex warmpool yesterday in cursor but it got stuck in a loop and spawn like 50 machines :frowning: )

Thanks

1 Like

I have something similar. Here’s my arch: How to prevent contention between background jobs in a multi-machine set-up?

I am building an API in my middle app, Distributor. This receives requests from Web that write to the database. I also have several Supervisord processes in the background that watch for database changes using simple loop polling:

  • If a status is requested and the record has not been processed, clone a template (stopped) machine and start it, and change the status to starting
  • If a status is starting and the target machine can receive requests, send the request and change the status to started
  • If there are too many machines running, add a minute to the request start time and change the status to delayed
  • There’s a process to put delayed requests back to requested if the running machine count drops
  • Every time a request is rejected it bumps up a retry count, and there’s a process to permanently reject requests that are rejected too many times (e.g. invalid request)

This may seem like a lot of work, but each process is only a bit of SQL run against a managed database, and a bit of language logic to move records between different statuses. Some queries have to be protected against race conditions, bearing in mind that this process will have redundant copies.

Your situation will be a little different in that rather than starting machines on demand, you have a pool of running ones. That’s just another process in a process manager to start extra machines over the number that are occupied with real users. You might also have something to destroy machines that no longer have a user, so that the free pool is kept at a consistent number.

Try using suspend. Even with a new machine it should take up to a couple seconds max. Not a terrible UX if the user is connecting to a live chat service.

Do you really need 1:1? Seems like a bunch of wasted resources if it’s only taking 20% (includes overhead.) You can probably have 2-3 more users in there.

If your ws server is scaled up, you have to handle the scenario where a request gets routed to a different machine since there’s no sticky session… you need to look into fly-replay but some users have reported that doesn’t work (fly or user skill issue), so you mileage may vary.

1 Like