Multiple days ago, deploying in FRA was nearly impossible or very chaotic.
Yesterday, Postgres databases in FRA were down for at least an hour.
Is there something wrong with this region? Should I take the time to migrate to another region, perhaps AMS? These are the questions I asked myself yesterday evening when facing this issue.
My main SaaS product is used by restaurants to distribute their menu digitally. Yesterday, their customers were not able to see the menu… You can see why this is problematic.
Today, I’m trying to figure out what went wrong and how I should handle this in the future. Sure, Fly.io had difficulties. Such is life. Sh*t happens, and I don’t blame Fly.
But clearly, I forgot to have a contingency plan. That’s my fault.
Can someone give me guidance on what should I do? I’m not an infrastructure expert, but I’d really like to level up, and perhaps this thread could help others in the future.
What I have in mind currently is:
– Have a replicate of the database in another region? Currently I have a two instances setup (master & slave), but I believe they are both in the same region, so it didn’t help when FRA was down.
– Have a backup Node process in case it’s not the database but the datacenter running the Express (Remix) server that is down. How am I suppose to do that? Am I suppose to play with some load balancing/nginx to achieve that?
These are the two ideas I have, but I’m sure the question is bigger, and the answer more complex that these two food for thoughts idea.
I’d love if someone could help me figure out what is the best strategy.