Automatically starting/stopping Apps v2 instances

ignoramous · April 25, 2023, 12:29pm

Built-in health-checks ([[services]]) shouldn’t but they did. At least for us. And so, we had to remove them: github/serverless-dns/pull/148

Custom health-checks ([checks]) haven’t been waking up Machines (onboarded on to Apps V2), however.

kurt · April 25, 2023, 1:23pm

Those didn’t wake machines up, they prevented the in-VM proxy from shutting down. When our proxy starts/stops things, it’s not even aware of health checks.

senyo · April 25, 2023, 1:48pm

That’s part of our efforts to improve Fly apps’ availability. See Increasing Apps V2 availability.

Awesome, great to hear. There was one other small fix that we deployed to get it working consistently. If come across any issues again, please let us know!

tskowron · April 26, 2023, 10:08am

I’ve migrated an internal low-volume app from v1 to v2 where a single node is enough, but now if the host goes down, my service will become entirely unavailable. For that reason I wanted to add another “standby” node with auto-start/stop, but it’s an internal service that doesn’t even have a service section (It’s being called from a different app using top1...)

I presume it’s not possible (at least at the moment) to do auto-start/stop with internal network services, right?

senyo · April 26, 2023, 1:19pm

Yep, unfortunately this won’t work. You’d have to start/stop the machine manually via the API.

amine · April 26, 2023, 2:37pm

That’s unfortunate, any plans to support this in the future ?

What would be the recommendation to handle cases where you have one gateway fly app exposed to the internet and with auto{stop,start} enabled and routing requests to backend apps that are internal services.
Perhaps one way to handle this is by having some sort of notification mechanism just like with AWS spot interruption notifications that an app process could check continuously and trigger stop of dependent apps ?

senyo · April 26, 2023, 2:53pm

Not at the moment. It is something we’ve thought about before but it’s quite complex to do and we just haven’t found the time to dedicate to solving it yet.

If you want to take advantage of the autostart/autostop feature directly and you’re fine with defining [[services]] for your internal apps, you could do that and then ensure all the internal services have a Flycast IP and no public IPs. Communicating over Flycast will make this feature available to you.

Alternatively, you’d have to implement the start/stop functionality in your system. One way you could do that is by having your app start the "standby’ if it fails to connect to the primary machine. There’s likely other topologies that would make sense depending on how your system is put together.

amine · April 27, 2023, 2:48pm

Thanks @senyo . I’m not against using Flycast, as that should bring most of the features to internal apps/services. Is there any downsides of going through that route ?

senyo · April 27, 2023, 3:43pm

If you need control over routing, i.e exactly which machine a request is sent to, you lose that control using Flycast (unless you use fly-replay). Otherwise, there’s no downside to using Flycast

Beaux · May 1, 2023, 9:47am

It works perfectly now, thanks!

AsymetricalData · May 5, 2023, 8:37am

Is anyone having trouble with auto_stop_machines today in AMS ?
Yesterday the proxy started my app on demand, but today it doesn’t work anymore. The machine stays suspended, and doesn’t receive any signal to start again.

auto_stop_machines works great and downscale everything.

Got :

Failed to proxy HTTP request (error: no known healthy instances found for route tcp/443. (hint: is your app shutdown? is there an ongoing deployment with a volume or using the 'immediate' strategy? if not, this could be a delayed state issue)). Retrying in 947 ms (attempt 90)

I don’t have attached volumes to this app, and the proxy has shutdown the app. No ongoing deployment.
Yesterday, it worked fine.

senyo · May 5, 2023, 10:58am

We deployed a change yesterday that caused this regression. We’re reverting it at the moment, it should start working soon.

AsymetricalData · May 6, 2023, 8:05am

It worked again less than an hour after your message.

Elder · May 8, 2023, 12:15pm

Love this feature. I have a perfect case for it - an instance of image proxy, which is needed on demand only. The stopped machine seems to get started within 0.1-0.5 seconds which is fine for me.

pier · May 13, 2023, 1:25am

Is the kill_timeout setting taken into account now?

senyo · May 13, 2023, 6:33am

Not yet, but thanks for the reminder. I’ll look into it!

ignoramous · May 26, 2023, 1:20pm

If the proxy respected kill_timeout and kill_signal, that’d be nice. Any timelines?

Also:

Does the above condition hold when auto_start_machine and auto_stop_machine are not used? This usecase was unsupported before [0]. From my experience, multiple Machines in the same region when spun up never did go idle, as in, if two Machines in a region xyz were spun up, then both would get sent incoming connections despite both being well below their soft_limits. Ideally, I’d expect Fly-Proxy to pick one Machine over the other until soft_limit was breached) .

Or, should I use --ha=false flag, like mentioned here? fly migrate-to-v2 - Automatic migration to Apps V2 - #45 by JP_Phillips

[0]

JP_Phillips · May 26, 2023, 1:41pm

flyd now respects those two configuration options on a machine.

I can’t speak to the load balancing decisions of the proxy with respect to soft_limits.

rgrove · May 26, 2023, 10:58pm

I was excited to use this feature to automatically start and stop new web servers as needed for my small Mastodon instance, since traffic varies a lot.

Strangely, it often downscales a machine and then immediately restarts that machine despite no increase in requests. As a result, I often have two machines running (with one of them frequently restarting) even when the request load is well below the configured soft limit.

Here’s what I see in the logs when this happens:

2023-05-26T22:40:41Z proxy [e2865756b1e486] sea [info]Downscaling app pie-gd-mastodon-v2 in region sea. Automatically stopping machine e2865756b1e486. 2 instances are running, 0 are at soft limit, we only need 1 running
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]metrics   | Interrupting...
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]rails     | Interrupting...
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]streaming | Interrupting...
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]caddy     | Interrupting...
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]metrics   | ts=2023-05-26T22:40:41.238Z caller=main.go:542 level=info msg="Received os signal, exiting" signal=interrupt
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]streaming | WARN Worker 1 exiting
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]metrics   | signal: interrupt
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]caddy     | signal: interrupt
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]streaming | WARN Worker 1 exiting
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]rails     | - Gracefully stopping, waiting for requests to finish
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]rails     | === puma shutdown: 2023-05-26 22:40:41 +0000 ===
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]rails     | - Goodbye!
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]rails     | Exiting
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]Sending signal SIGINT to main child process w/ PID 513
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]streaming | signal: interrupt
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]streaming |
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]rails     | signal: interrupt
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]Starting clean up.
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]hallpass exited, pid: 514, status: signal: 15
2023-05-26T22:40:41Z app[e2865756b1e486] sea [info]2023/05/26 22:40:41 listening on [fdaa:0:d7b2:a7b:124:eace:e441:2]:22 (DNS: [fdaa::3]:53)
2023-05-26T22:40:42Z app[e2865756b1e486] sea [info][  355.537693] reboot: Restarting system
2023-05-26T22:40:52Z proxy[e2865756b1e486] sea [info]Starting machine
2023-05-26T22:40:53Z app[e2865756b1e486] sea [info]Starting init (commit: 9bb7ee8)...
2023-05-26T22:40:53Z app[e2865756b1e486] sea [info]Preparing to run: `hivemind Procfile.mastodon` as mastodon
2023-05-26T22:40:53Z app[e2865756b1e486] sea [info]2023/05/26 22:40:53 listening on [fdaa:0:d7b2:a7b:124:eace:e441:2]:22 (DNS: [fdaa::3]:53)
2023-05-26T22:40:53Z app[e2865756b1e486] sea [info]rails     | Running...
2023-05-26T22:40:53Z app[e2865756b1e486] sea [info]streaming | Running...
2023-05-26T22:40:53Z app[e2865756b1e486] sea [info]caddy     | Running...
2023-05-26T22:40:53Z app[e2865756b1e486] sea [info]metrics   | Running...
2023-05-26T22:40:53Z proxy[e2865756b1e486] sea [info]machine started in 278.142947ms

This app uses Hivemind to start a few processes (Rails, Node.js, Caddy, and a statsd exporter), but other than that it’s not doing anything special.

All the config files I use for this server are public, and you can see them here. Have I misconfigured something, or is this possibly a bug?

senyo · May 29, 2023, 8:34am

Unfortunately, not at the moment. The proxy team capacity is spread thin right now. We will likely have time once a good set of the Apps v2 migration is completed.

This behaviour is still the same. Requests are load balanced between all your running machines. It’s effectively a round robin load balancing approach.

Topic		Replies	Views
Setting a minimum number of instances to keep running when using auto start/stop Fresh Produce	25	5384	October 30, 2024
Increasing Apps V2 availability Fresh Produce appsv2	20	3536	January 24, 2024
migrate-to-v2 Now Supports Nomad Autoscaling Apps Fresh Produce appsv2 , machines	13	1308	June 28, 2023
Machine created via API is not suspending troubleshooting , machines , autoscaling	13	105	May 11, 2025
auto_stop_machines: true and min_machines_running:0 do not scale down to 0 Questions / Help	4	846	August 16, 2023

Automatically starting/stopping Apps v2 instances

Related topics