Autoscaling causing 502s

Hi there,

I’ve noticed there is a spike in 502 responses whenever there is a fly.io autoscaling action.

Is this expected? Is there anything we can do to minimise 502 responses?

Cheers,
Stefan

As a side note I’ve noticed when running a scale command my site goes down for 10-15 seconds. While its reasonable, it would be nice if the old pods can stay up while the new ones get created

There’s currently lag replicating our state across our fleet causing this downtime. This shouldn’t be happening when using the “rolling” deployment strategy (the default one, unless you have a volume attached to your VM), but it does happen sometimes due to state replication lag.

We’re working on this right now as one of our top priorities. Apps appearing down is not acceptable.

I’m expecting we’ll launch improvements this week and keep working on it from there.

@charsleysa @nahtnam We’ve launched some improvements that should help a lot with these errors. Let us know how it goes!