App was down for 20 minutes in the middle of the night and then restarted. How to investigate?

Hi there! Yesterday (2021-01-05) between 23:13:56 and 23:34:24 (UTC+0), our app (currently running on a single node) did not respond to any uptime pings or other traffic and afterwards restarted.

In the metrics page of the dashboard, we see a clear dip in that time interval where ‘VM service concurrency’ and ‘data transfer’ went down to 0.

What is strange however is that this restart is not shown under fly status.

There is also no indication anywhere on why our Elixir app restarted. We would have expected to have seen some info about memory usage in our in-app logs. (We are not currently running a fly-log-shipper inside our cluster.)

Could you investigate what happened?

Hi,

Where is/was your app running?
If it was LHR by any chance, it may have been this.

1 Like

It is supposed to normally be running on ams (which is the sole datacenter in its main ‘region pool’) but currently it is listed as lhr(B) (lhr and fra are in its ‘backup region’ pool) in fly status.

Thank you for clearing this up; I’ll immediately subscribe to the updates mailer :+1:.