How to design an app so that multiple instances don't operate simultaneously on items of work?

I have a deployment with 4 machines, deployment goes well until all four reach good state but the issue is, when more that one machine is active i get too much endless logs and the system is not working, when i go to dashboard and shop 3 machines, leaving one active it works fine, am just wondering if fly have a mechanism that handles that, because after some minutes i find another machine have started automatically and previous didn’t quit… Im having a hadr time.

This is rather hard to follow, unfortunately.

and the system is not working

Would you be more specific here?

and shop 3 machines, leaving one active it works fine

Do you mean stop 3 machines?

after some minutes [I] find another machine have started automatically and previous didn’t quit

How many machines do you actually want running? May we see your TOML config?

Yes I meant stop: I scaled from my shell 4 machines which they start well, but I realize when more than 1 machine is active my service is broken. Being specific am running an automated customer service bot and so when more that one instance is running the bot wont work or it will send duplicated response for every one question, so i went on my dashboard and shut 3 machines, leaving one active, that was pretty awesome, but after sometime the machine turns back on automatically. Does fly have a mechanism to ensure only one machine runs at a time or by default your service runs multiple instances at once ?

Right, you’re fixing it in the wrong way. Keep multiple machines running, and create a system at the application level for an instance of your code to create a “reservation” on a new conversation.

I think the easiest way to do this is to have a relational database that all instances share, and they update a conversation with their hostname, using SQL record locking, so that only one can succeed. This works because it does the status fetch and the hostname update in the same atomic operation.

You could design it so that each instance reserves a whole conversation (and so machines need to live for the whole of a conversation) or instances bid for messages, so that if a machine fails midway through a conversation, another machine will pick it up.

(Using my edit permissions, I have amended the title of this thread to more accurately reflect the nature of the problem.)

1 Like

You could scale the machines down to just running one machine for the application with fly scale count 1. This should destroy the extra machines and only one machine would be running, there is docs on this here: Scale the Number of Machines · Fly Docs

There is logic that will automatically bring up machines you stopped, when they get traffic, that’s documented here: Autostop/autostart Machines · Fly Docs

1 Like

True, but there is Fly advice elsewhere that important apps should contain at least two machines, so that if host hardware fails, there is still another instance to pick up the slack. (My prototype app featuring just one machine has been surprisingly stable, but there’s quite a few reports hereabouts from less lucky customers.)

1 Like

There seems to be a fundamental architectural problem with how OP is processing requests. Perhaps they can provide more details.