After the incident on April 24, on the 25th I had problems with 3 of the 4 machines I had reserved.
These machines began to respond to the vast majority of requests with the following message: could not complete HTTP request to instance: connection error: connection reset
Finally I had to build new machines, which did not present the problem.
My question is, every time there are service outages, will I have to recreate my machines? Is there any way to detect the problem earlier? Is this normal?
It should be noted that after the incident on April 24, I executed the fly doctor command on the application and everything responded satisfactorily, but at the time of the incident, when hundreds of users needed to use my platform, the machines presented the failure and therefore my platform .