Machine not autoscaling to zero anymore after it went dead last night

Hi there,

I was using a fly machine that autoscaled to zero because most of the time it is not used… It was working fine but last night went dead and was not waking up.

After restarting successfully today with “fly machines restart” (which I could not do for a while because fly CLI commands seemed not to work, and even the dashboard was behaving weird) now it works again, but when the server exits after 5 minutes of inactivity, is restarted automatically a few seconds later. I think that something has changed, but I don’t know what… some logs for context:

2023-03-16T16:09:55.060 proxy[6e82936b096e87] ams [info] Starting machine
2023-03-16T16:09:56.315 app[6e82936b096e87] ams [info] Starting init (commit: 81d5330)...
2023-03-16T16:09:56.340 app[6e82936b096e87] ams [info] Preparing to run: `/bin/sh -c ./boot.sh` as root
2023-03-16T16:09:56.360 app[6e82936b096e87] ams [info] 2023/03/16 16:09:56 listening on [fdaa:0:75a9:a7b:c988:91a9:b2c1:2]:22 (DNS: [fdaa::3]:53)
2023-03-16T16:09:56.509 app[6e82936b096e87] ams [info] * Starting sshd ... [ ok ]
2023-03-16T16:09:56.597 proxy[6e82936b096e87] ams [info] machine started in 1.539286109s
2023-03-16T16:09:56.605 proxy[6e82936b096e87] ams [info] machine became reachable in 7.733439ms
2023-03-16T16:10:25.077 proxy[6e82936b096e87] ams [error] could not proxy TCP data to/from instance: failed to copy (direction=server->client, error=Connection reset by peer (os error 104))
2023-03-16T16:14:57.721 app[6e82936b096e87] ams [info] Starting clean up.
2023-03-16T16:14:58.721 app[6e82936b096e87] ams [info] [ 302.492337] reboot: Restarting system
2023-03-16T16:14:59.775 runner[6e82936b096e87] ams [info] machine exited with exit code 0, not restarting
2023-03-16T16:15:49.234 proxy[6e82936b096e87] ams [info] Starting machine
2023-03-16T16:15:50.351 app[6e82936b096e87] ams [info] Starting init (commit: 81d5330)... 

And so on to infinity… notice the “machine exited with exit code 0, not restarting”, followed by a “Starting machine” a few seconds later… Before, it did not wake up until receiving a request, which was amazing and worked fine for a few monthes… until it stopped working.

Two things that puzzle me:

  1. The fear that the machine will go dead again without warning. Now I am monitoring it externally each hour to receive an alert if something goes wrong again… but then why go serverless if I have to monitor it myself anyway?

  2. How can I get it to autoscale to zero again? Now is wasting resources… my intuition tells me that something is different now in the platform, but I have not found what yet.

Best,
Kurt.-

Those logs look like the proxy is starting the machine. Like this:

2023-03-16T16:09:55.060 proxy[6e82936b096e87] ams [info] Starting machine

I think all of the starting messages in there are from the proxy.

What were you seeing when it “went dead”?

Hi Kurt,

When it “went dead” any HTTP request made to the app would time out, instead of waking up the machine. Machine status was “stopped” but that was normal when it was scaled to zero. Application status was “Suspended”… now it says “Deployed”… I don’t recall exactly what was the app status before, but I think (not 100% sure) that it said “suspended” and it woke up correctly when invoked.

How I can avoid the proxy to start the machine automatically? If I recall correctly, months ago I had to do a “fly scale count 0” for the machine to autoscale to 0 correctly, but now I get “This command doesn’t support V2 apps yet, use fly machines update and fly machines clone instead”, so not sure how it is done now.

The only thing I did to recover the machine is the “fly machines restart”… maybe redeploying the machine will help?

Best.

Please post here if you see that again. The application state is suspended when machines aren’t running, and deployed when they are, so that’s normal. But a stopped machine should start properly when a request comes in.

If you do notice it, it would be useful to see the output of fly logs and fly machine status <id> when it happens.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.