A while back we posted about new routing capabilities in fly-proxy
to bounce traffic through a third region when two regions are experiencing a temporary network blip. That capability was experimental at the time, and only enabled in a few regions such as sin
. We are now happy to announce that this new, smarter routing logic (that we call “fallback routing”) is fully rolled out on our “edges” in all regions!
Firstly, a bit of a recap. Requests from your users to our platform enter through a set of servers called “edges” which (used to) directly connect to a selected machine hosting your app. This can sometimes suffer from network blips between the region of the edge (close to the user) and the machine hosting your app. The idea of “fallbacks” is that, assuming network instabilities are temporary in nature and limited in the scope of their effect, a third region is very unlikely to be seeing the same instability happening at the same time. So, rather than waiting the instability out, which may take anywhere from seconds to minutes, an edge could try to connect to your app’s machines through hosts in another region. This greatly reduces the impact of any of these network blips.
In the past few weeks, we have gradually enabled this in more and more regions. In the process, we have also been developing better logic to pick fallback regions / nodes based on both internal network monitoring and manually created routing rules. These rules come in the form of “if regions A, B, … cannot connect to regions C, D, …, try connecting through regions E, F, …”, and represent knowledge that cannot be easily extracted from data. For example, we now instruct our regions in Asia to try to connect through Europe when they experience issues reaching US regions, because these two sets of undersea cables (Asia => US vs. Asia => EU and EU => US) are physically distinct and unlikely to be down at the same time. This has proven to work very well in the last few network blips we have observed.
As a result, we have now applied fallback routing on all edges. Some of the limitations described in the original Fresh Produce post still apply — this is still a HTTP(S)-only feature, and only works for requests through our edges (i.e. originating from a user outside of our platform). Next, we will be working on a similar feature for raw TCP connections, and we will also be looking at enabling this for requests internal to our platform through Flycast. As always, stay tuned!