Improvements to fly-proxy downscaler

fly-proxy, our HTTP and TCP routing layer, running on each worker host is also
responsible for scaling down apps if certain conditions are met. The process is
quite simple: if an app has autostop enabled and the combined load across all
instances in a region is below some calculated threshold, the proxy is allowed
to stop an instance. Once the instance is stopped the process repeats again for
the next one.

As with everything distributed, this process, simple as it is, has some
complications:

  • the load information that proxies see is never actual and always delayed
    by a bit, so the proxy may stop a machine only to have to immediately start
    it back;

  • other proxies don’t know if a particular machine is about to
    get stopped or suspended and continue sending requests to it;

  • machines with autostop=suspend don’t receive any notifications and may be
    suspended with active connections/requests.

This post is about a change that’s addressing the first two problems.

While working on better state synchronization between flyd and Corrosion
we’ve changed how machine cordoning is implemented internally. We used to simply
delete service definitions from Corrosion, which made a machine completely
“invisible” to the proxy. Now we have a dedicated flag so we can represent
not only “cordoned” and “not-cordoned” states, but some states in-between
as well. So we’ve added a third state that we call “soft-cordoned”.

The “soft-cordoned” state tells the proxy to avoid a machine as much as
possible during routing, but still allows the proxy to route to it if there are
no better candidates available. The downscaler now soft-cordons a machine for
a short period before actually issuing stop or suspend command and aborts
the process if the machine received any new request or connection while being
soft-cordoned. The change should help in situations where proxy mistakenly stops
a machine because of the outdated load information and lays some foundations for
future downscaling improvements.

Stay tuned.

2 Likes