Hi there,
I was using a fly machine that autoscaled to zero because most of the time it is not used… It was working fine but last night went dead and was not waking up.
After restarting successfully today with “fly machines restart” (which I could not do for a while because fly CLI commands seemed not to work, and even the dashboard was behaving weird) now it works again, but when the server exits after 5 minutes of inactivity, is restarted automatically a few seconds later. I think that something has changed, but I don’t know what… some logs for context:
2023-03-16T16:09:55.060 proxy[6e82936b096e87] ams [info] Starting machine
2023-03-16T16:09:56.315 app[6e82936b096e87] ams [info] Starting init (commit: 81d5330)...
2023-03-16T16:09:56.340 app[6e82936b096e87] ams [info] Preparing to run: `/bin/sh -c ./boot.sh` as root
2023-03-16T16:09:56.360 app[6e82936b096e87] ams [info] 2023/03/16 16:09:56 listening on [fdaa:0:75a9:a7b:c988:91a9:b2c1:2]:22 (DNS: [fdaa::3]:53)
2023-03-16T16:09:56.509 app[6e82936b096e87] ams [info] * Starting sshd ... [ ok ]
2023-03-16T16:09:56.597 proxy[6e82936b096e87] ams [info] machine started in 1.539286109s
2023-03-16T16:09:56.605 proxy[6e82936b096e87] ams [info] machine became reachable in 7.733439ms
2023-03-16T16:10:25.077 proxy[6e82936b096e87] ams [error] could not proxy TCP data to/from instance: failed to copy (direction=server->client, error=Connection reset by peer (os error 104))
2023-03-16T16:14:57.721 app[6e82936b096e87] ams [info] Starting clean up.
2023-03-16T16:14:58.721 app[6e82936b096e87] ams [info] [ 302.492337] reboot: Restarting system
2023-03-16T16:14:59.775 runner[6e82936b096e87] ams [info] machine exited with exit code 0, not restarting
2023-03-16T16:15:49.234 proxy[6e82936b096e87] ams [info] Starting machine
2023-03-16T16:15:50.351 app[6e82936b096e87] ams [info] Starting init (commit: 81d5330)...
And so on to infinity… notice the “machine exited with exit code 0, not restarting”, followed by a “Starting machine” a few seconds later… Before, it did not wake up until receiving a request, which was amazing and worked fine for a few monthes… until it stopped working.
Two things that puzzle me:
-
The fear that the machine will go dead again without warning. Now I am monitoring it externally each hour to receive an alert if something goes wrong again… but then why go serverless if I have to monitor it myself anyway?
-
How can I get it to autoscale to zero again? Now is wasting resources… my intuition tells me that something is different now in the platform, but I have not found what yet.
Best,
Kurt.-