App going down for 15 minutes regularly

I am on the free tier of Fly. I deployed a Phoenix application with defaults - one container for elixir code and another for postgres db. It is a hobby plan - and - pay-as-you-go. The usage generated a bill of $8 last month - and I duly paid.
However, I have an uptime monitor enabled for this domain - and- the monitor on a regular basis is sending me downtime alerts for the domain. The downtime is measuring close to 15 minutes. Further, it is in the middle of night IST - and - I could not verify whether site is down at that time.
My question is - should we build a redundancy by default even at such low traffic? Is it normal? How can I proceed with debugging this issue?

  1. Redundancy is a good idea (esp, if you’re app is embarassingly redundant-able, that is, you can simply spin up one more instance of your app without having to change any code or tune any knobs).

  2. It is not normal for apps to go down 15m with regularity. If the 15m thing you saw was a one off, I can see how that may have happened.

    I’ve observed, some apps that use volumes (disk) or have stable public IPv6 (like Machine apps) do get zombified when the underlying server goes kaput. But that isn’t a 15m downtime… that’s a blackout.

    If you can share app name, region, vm-id, the uptime monitor you use, the various periods of time the monitor complained about uptime, it may help Fly engs narrow down the root cause. To hasten up the process, one might consider the £29/mo plan which gets them exclusive access to Fly’s customer support.

  3. One can setup a pager, I guess, and check if it’s the uptime monitor with a false positive or if it’s Fly. But that’s likely an overkill given it is a hobby thing.