Pm me if it’s possible here, it’s in cdg btw, but yeah things aren’t consistent, the three instances are still in a “started” state without any requests sent to them.
We just shipped a fix that should solve this. Let me know if its working for you
They’re still all in a started state. Do I need to update to 0.0.541 + fly launch again ?
Edit : Updated flyctl and nuked everything, the results :
First I noticed that this time fly launch spawned 2 machines and not 1 (I don’t have different processes and am using overmind).
I then cloned to have 3 instances, the 3 instances then stopped.
I sent a bunch of requests, the 3 started.
Then the 3 went in a “stopped” state.
I then destroyed a machine and reproduced for 2, the 2 started when I sent a bunch of requests and they stopped 1 by 1 after that.
So it seems to work now.
Built-in health-checks ([[services]]
) shouldn’t but they did. At least for us. And so, we had to remove them: github/serverless-dns/pull/148
Custom health-checks ([checks]
) haven’t been waking up Machines (onboarded on to Apps V2), however.
Those didn’t wake machines up, they prevented the in-VM proxy from shutting down. When our proxy starts/stops things, it’s not even aware of health checks.
That’s part of our efforts to improve Fly apps’ availability. See Increasing Apps V2 availability.
Awesome, great to hear. There was one other small fix that we deployed to get it working consistently. If come across any issues again, please let us know!
I’ve migrated an internal low-volume app from v1 to v2 where a single node is enough, but now if the host goes down, my service will become entirely unavailable. For that reason I wanted to add another “standby” node with auto-start/stop, but it’s an internal service that doesn’t even have a service section (It’s being called from a different app using top1...
)
I presume it’s not possible (at least at the moment) to do auto-start/stop with internal network services, right?
That’s unfortunate, any plans to support this in the future ?
What would be the recommendation to handle cases where you have one gateway fly app exposed to the internet and with auto{stop,start} enabled and routing requests to backend apps that are internal services.
Perhaps one way to handle this is by having some sort of notification mechanism just like with AWS spot interruption notifications that an app process could check continuously and trigger stop of dependent apps ?
Not at the moment. It is something we’ve thought about before but it’s quite complex to do and we just haven’t found the time to dedicate to solving it yet.
If you want to take advantage of the autostart/autostop feature directly and you’re fine with defining [[services]]
for your internal apps, you could do that and then ensure all the internal services have a Flycast IP and no public IPs. Communicating over Flycast will make this feature available to you.
Alternatively, you’d have to implement the start/stop functionality in your system. One way you could do that is by having your app start the "standby’ if it fails to connect to the primary machine. There’s likely other topologies that would make sense depending on how your system is put together.
Thanks @senyo . I’m not against using Flycast, as that should bring most of the features to internal apps/services. Is there any downsides of going through that route ?
If you need control over routing, i.e exactly which machine a request is sent to, you lose that control using Flycast (unless you use fly-replay). Otherwise, there’s no downside to using Flycast
It works perfectly now, thanks!
Is anyone having trouble with auto_stop_machines
today in AMS ?
Yesterday the proxy started my app on demand, but today it doesn’t work anymore. The machine stays suspended, and doesn’t receive any signal to start again.
auto_stop_machines
works great and downscale everything.
Got :
Failed to proxy HTTP request (error: no known healthy instances found for route tcp/443. (hint: is your app shutdown? is there an ongoing deployment with a volume or using the 'immediate' strategy? if not, this could be a delayed state issue)). Retrying in 947 ms (attempt 90)
I don’t have attached volumes to this app, and the proxy has shutdown the app. No ongoing deployment.
Yesterday, it worked fine.
We deployed a change yesterday that caused this regression. We’re reverting it at the moment, it should start working soon.
It worked again less than an hour after your message.
Love this feature. I have a perfect case for it - an instance of image proxy, which is needed on demand only. The stopped machine seems to get started within 0.1-0.5 seconds which is fine for me.
Is the kill_timeout
setting taken into account now?
Not yet, but thanks for the reminder. I’ll look into it!
If the proxy respected kill_timeout
and kill_signal
, that’d be nice. Any timelines?
Also:
Does the above condition hold when auto_start_machine
and auto_stop_machine
are not used? This usecase was unsupported before [0]. From my experience, multiple Machines in the same region when spun up never did go idle, as in, if two Machines in a region xyz
were spun up, then both would get sent incoming connections despite both being well below their soft_limit
s. Ideally, I’d expect Fly-Proxy to pick one Machine over the other until soft_limit
was breached) .
Or, should I use --ha=false
flag, like mentioned here? fly migrate-to-v2 - Automatic migration to Apps V2 - #45 by JP_Phillips
[0]