Our app stopped responding to requests from approx 14:31 to 14:39 UTC today. It has been running in two regions, IAD and ORD. Has there been an outage?
In the fly console and
fly logs, no logs from the IAD instance are visible. The app has returned to responding to requests, but we still see no request logs from IAD.
(We also see no request logs from ORD currently, but that “may” be normal - this service only receives requests from a caller that is probably geogaphically closer to IAD, hence being loadbalanced to IAD).
We had a BGP issue that affected anycast IPs for about five minutes. More details: Fly.io Status - Load balanced IP routing issues
Ah, that must have been it, thanks.
(Odd, I could have sworn I checked the status page a moment before posting here and saw nothing.)
Sorry, quick follow up: That issue would explain the connectivity drop, although we still don’t see any logs for this application. Is there perhaps a secondary issue causing logs not to be collected?
The app is back in service & metrics confirm this, but the last log entry in
fly logs is from
Oh, sorry I didn’t catch that half of the problem.
It’s quite possible the log forwarder for your IAD instance has a problem. We’ve been trying to fix a bug with log collection that affects specific VMs.
If your app can handle a restart, try running
fly vm stop <id> on the affected instance and see if the replacement gives you logs.
Not using VMs, but restarting the app seems to have done the trick:
2022-07-28T17:43:14Z runner[554631dc] ord [info]Shutting down virtual machine
2022-07-28T17:43:14Z app[554631dc] ord [info]Sending signal SIGINT to main child process w/ PID 516
2022-07-28T17:43:25Z runner[da4f9d6f] ord [info]Starting instance
2022-07-28T17:43:25Z runner[da4f9d6f] ord [info]Configuring virtual machine
(followed by similar for iad)