Automatically starting/stopping Apps v2 instances

Not yet. We’re currently working to expose this for Apps v2.

2 Likes

I already wrote a script doing that, it works great so far.
it was just to know if I could get rid of my script.

My users come in grapes (student at school), with few minutes between each “batch”, and I scale-to-zero an app when their is no users planned for a long period of time (at night in my region or sometimes in the day).
I don’t want to stop the machine between those grapes during the peek of activity, because the machine would start and stop every 5 minutes and clean all the local cache.

1 Like

Doesn’t seem to work for me.

My flyctl is up to date. I’ve fly launch a django app and

 [http_service]
     internal_port = 8000
     force_https = true
     auto_stop_machines = true
     auto_start_machines = true

has been automatically added to fly.toml.

Whith a single machine it was indeed working and stopping it when no request were coming, although I guess it was way too fast, few seconds without a request and the machine was stopped.

But when cloning it (in the same region) it didn’t scale to 1 nor to 0 and both machines were still on a started satate after 10+ minutes although no requests were made.

Edit : I posted this message this around 10AM, it’s now 2AM, the 2 instances stayed “started” all day and it didn’t scale down at all, I’ve destroyed one machine and the one left just stopped a little bit after, so it confirmed what I described earlier.

Edit2 : Just cloned 2 times thus 3 machines and two of them stopped, so either its a bug when you have 2 machines or I don’t get the logic.
For 3 instances and 0 under under soft limit : excess = 3 - (0+1) = 2, so you kill 2, 2 are stopped 1 running, it’s as expected. But for 2 with the same logic 1 should be running and 1 stopped, here they both stay in a started state.

On a side not shouldn’t it be an opt in option to scale down to zero and suffer from cold starts ?

3 Likes

Hi, do service http health-checks wake up machines?

We shipped a fix for this last week. Let me know if it is working for you

1 Like

Health checks do not wake up machines.

3 Likes

I just tried to reproduce this issue myself and didn’t come across it. It’s likely that the updated proxy wasn’t deployed in the region your application was running on at the time of the post. Is this still a problem for you?

It’s a good point. We’re still thinking about this internally and if and how we would support it. That may look like another configuration option or it could be part of a more advanced autoscaling implementation.

I mean why would have it worked for 1 and 3 machines but not 2 if it was the case ?

I tried to reproduce and :

With 1 instance it get stopped almost immediately, with 2 one stay started and the other one stopped, sending a bunch of requests make the second one starts almost immediately too.

But with 3 instances without sending any request 2 stay started and only one get stopped. (edit : I’ve sent a bunch of requests and now 3 out of 3 started but they’re now staying in a started state and it doesn’t seem they’ll stop )

Do you mind sharing your application name? It’ll be easier for me to debug if I can view logs/see if things are working on the hosts your application is on?

Pm me if it’s possible here, it’s in cdg btw, but yeah things aren’t consistent, the three instances are still in a “started” state without any requests sent to them.

1 Like

We just shipped a fix that should solve this. Let me know if its working for you

They’re still all in a started state. Do I need to update to 0.0.541 + fly launch again ?

Edit : Updated flyctl and nuked everything, the results :

First I noticed that this time fly launch spawned 2 machines and not 1 (I don’t have different processes and am using overmind).

I then cloned to have 3 instances, the 3 instances then stopped.

I sent a bunch of requests, the 3 started.

Then the 3 went in a “stopped” state.

I then destroyed a machine and reproduced for 2, the 2 started when I sent a bunch of requests and they stopped 1 by 1 after that.

So it seems to work now.

1 Like

Built-in health-checks ([[services]]) shouldn’t but they did. At least for us. And so, we had to remove them: github/serverless-dns/pull/148

Custom health-checks ([checks]) haven’t been waking up Machines (onboarded on to Apps V2), however.

1 Like

Those didn’t wake machines up, they prevented the in-VM proxy from shutting down. When our proxy starts/stops things, it’s not even aware of health checks.

1 Like

That’s part of our efforts to improve Fly apps’ availability. See Increasing Apps V2 availability.

Awesome, great to hear. There was one other small fix that we deployed to get it working consistently. If come across any issues again, please let us know!

2 Likes

I’ve migrated an internal low-volume app from v1 to v2 where a single node is enough, but now if the host goes down, my service will become entirely unavailable. For that reason I wanted to add another “standby” node with auto-start/stop, but it’s an internal service that doesn’t even have a service section (It’s being called from a different app using top1...)

I presume it’s not possible (at least at the moment) to do auto-start/stop with internal network services, right?

2 Likes

Yep, unfortunately this won’t work. You’d have to start/stop the machine manually via the API.

That’s unfortunate, any plans to support this in the future ?

What would be the recommendation to handle cases where you have one gateway fly app exposed to the internet and with auto{stop,start} enabled and routing requests to backend apps that are internal services.
Perhaps one way to handle this is by having some sort of notification mechanism just like with AWS spot interruption notifications that an app process could check continuously and trigger stop of dependent apps ?

1 Like

Not at the moment. It is something we’ve thought about before but it’s quite complex to do and we just haven’t found the time to dedicate to solving it yet.

If you want to take advantage of the autostart/autostop feature directly and you’re fine with defining [[services]] for your internal apps, you could do that and then ensure all the internal services have a Flycast IP and no public IPs. Communicating over Flycast will make this feature available to you.

Alternatively, you’d have to implement the start/stop functionality in your system. One way you could do that is by having your app start the "standby’ if it fails to connect to the primary machine. There’s likely other topologies that would make sense depending on how your system is put together.

3 Likes

Thanks @senyo . I’m not against using Flycast, as that should bring most of the features to internal apps/services. Is there any downsides of going through that route ?