I have a theoretical question. I think I have an answer for it, but I’d be interested in getting other opinions.
I have an emerging architecture design that looks like this:
APP APP APP
---------------- ----------------- ----------------
| web | | distributor | | browser |
internet --> | (2 machines) | --> | (2 machines) | --> | (ephemeral |
| | | | | machines) |
---------------- ----------------- ----------------
So internet traffic comes into the web
machines, and since this is round-robin, either one can receive the traffic. For a particular action, a long-running web-crawling operation is required. So a request is sent to a distributor
, and again this is round robin. For both of these cases, the machines within would be in different regions.
The job of the distributor is to create browser
machines that do a crawling operation for a few minutes, send the data back to any web
instance, which writes it to a managed database. The browser
instances have no redundancy.
Now in distributor
, the machines will have a background job to poll the database, and if a job request is in a certain state, it creates a browser
. However, since each distributor
has its own background job, each one could detect the change independently, and create its own browser
, when I only want one.
My current thinking is that I need to use the managed database with row locking. However since I am quite new to system design, I’d be interested to hear how others have solved this sort of problem. In my case, I could have one distributor
with some stopped hot spares, but is that a design pattern that can easily be achieved on Fly?