Machine not autoscaling to zero anymore after it went dead last night

kurtie · March 16, 2023, 4:32pm

Hi there,

I was using a fly machine that autoscaled to zero because most of the time it is not used… It was working fine but last night went dead and was not waking up.

After restarting successfully today with “fly machines restart” (which I could not do for a while because fly CLI commands seemed not to work, and even the dashboard was behaving weird) now it works again, but when the server exits after 5 minutes of inactivity, is restarted automatically a few seconds later. I think that something has changed, but I don’t know what… some logs for context:

2023-03-16T16:09:55.060 proxy[6e82936b096e87] ams [info] Starting machine
2023-03-16T16:09:56.315 app[6e82936b096e87] ams [info] Starting init (commit: 81d5330)...
2023-03-16T16:09:56.340 app[6e82936b096e87] ams [info] Preparing to run: `/bin/sh -c ./boot.sh` as root
2023-03-16T16:09:56.360 app[6e82936b096e87] ams [info] 2023/03/16 16:09:56 listening on [fdaa:0:75a9:a7b:c988:91a9:b2c1:2]:22 (DNS: [fdaa::3]:53)
2023-03-16T16:09:56.509 app[6e82936b096e87] ams [info] * Starting sshd ... [ ok ]
2023-03-16T16:09:56.597 proxy[6e82936b096e87] ams [info] machine started in 1.539286109s
2023-03-16T16:09:56.605 proxy[6e82936b096e87] ams [info] machine became reachable in 7.733439ms
2023-03-16T16:10:25.077 proxy[6e82936b096e87] ams [error] could not proxy TCP data to/from instance: failed to copy (direction=server->client, error=Connection reset by peer (os error 104))
2023-03-16T16:14:57.721 app[6e82936b096e87] ams [info] Starting clean up.
2023-03-16T16:14:58.721 app[6e82936b096e87] ams [info] [ 302.492337] reboot: Restarting system
2023-03-16T16:14:59.775 runner[6e82936b096e87] ams [info] machine exited with exit code 0, not restarting
2023-03-16T16:15:49.234 proxy[6e82936b096e87] ams [info] Starting machine
2023-03-16T16:15:50.351 app[6e82936b096e87] ams [info] Starting init (commit: 81d5330)...

And so on to infinity… notice the “machine exited with exit code 0, not restarting”, followed by a “Starting machine” a few seconds later… Before, it did not wake up until receiving a request, which was amazing and worked fine for a few monthes… until it stopped working.

Two things that puzzle me:

The fear that the machine will go dead again without warning. Now I am monitoring it externally each hour to receive an alert if something goes wrong again… but then why go serverless if I have to monitor it myself anyway?
How can I get it to autoscale to zero again? Now is wasting resources… my intuition tells me that something is different now in the platform, but I have not found what yet.

Best,
Kurt.-

kurt · March 16, 2023, 4:43pm

Those logs look like the proxy is starting the machine. Like this:

2023-03-16T16:09:55.060 proxy[6e82936b096e87] ams [info] Starting machine

I think all of the starting messages in there are from the proxy.

What were you seeing when it “went dead”?

kurtie · March 16, 2023, 5:15pm

Hi Kurt,

When it “went dead” any HTTP request made to the app would time out, instead of waking up the machine. Machine status was “stopped” but that was normal when it was scaled to zero. Application status was “Suspended”… now it says “Deployed”… I don’t recall exactly what was the app status before, but I think (not 100% sure) that it said “suspended” and it woke up correctly when invoked.

How I can avoid the proxy to start the machine automatically? If I recall correctly, months ago I had to do a “fly scale count 0” for the machine to autoscale to 0 correctly, but now I get “This command doesn’t support V2 apps yet, use fly machines update and fly machines clone instead”, so not sure how it is done now.

The only thing I did to recover the machine is the “fly machines restart”… maybe redeploying the machine will help?

Best.

kurt · March 16, 2023, 5:29pm

Please post here if you see that again. The application state is suspended when machines aren’t running, and deployed when they are, so that’s normal. But a stopped machine should start properly when a request comes in.

If you do notice it, it would be useful to see the output of fly logs and fly machine status <id> when it happens.

system · March 23, 2023, 5:30pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Autoscale stopped my 1 machine and never started it again Questions / Help	10	209	March 26, 2024
Machine Left Running despite auto stop and no traffic machines , autoscaling , proxy	5	34	October 17, 2024
Machines being scaled to 0 even with `min_machines_running = 1`	5	785	July 20, 2023
auto_stop_machines: true and min_machines_running:0 do not scale down to 0 Questions / Help	4	657	August 16, 2023
Auto-scaler + auto start/stop interplay? Questions / Help autoscaling	3	152	June 21, 2024

Machine not autoscaling to zero anymore after it went dead last night

Related topics