I’ve been using fly io for around a year now, and I’m loving it, but my only concern that keeps me reconsidering if I should use it for my future projects is the occasional networking and connectivity issues in various regions.
This has given me problems in the past: once I had an app with machines running in 3 regions, and one of them had networking issues, which resulted in response errors for my users that requesting for that region.
My question is, why wouldn’t fly io reduce the impact of the issue by preferring or redirecting the requests to a machine in a region that has no active issues?
I’m no expert on infrastructure or networking, that’s why I’m raising this question, thank you!
Currently we do fail over to other machines when a host is down. However, you are right that sometimes networking might break just enough for apps to misbehave, but not enough for traffic to divert. There is work going on to hook this up with the host issues system so that fail over can happen even for half-broken hosts (when manually triggered).
Thank you for your reply @PeterCxy , I’m glad you guys are working on it.
That “manual trigger” of the fail over that you mentioned, is something to be handled by us or is it an internal functionality that you guys are working on to handle those issues?
@oteiza-a Sorry I didn’t make it clear, it is for us to trigger traffic diversion when “host issues” (see link in the previous reply) are in place. This will help in those cases where networking is not broken enough to be seen as “down” but still causes problems for apps.