Logs for my apps seem to have stopped sometime yesterday

Hi,

All my apps are running in the sin region. As I was trying to debug an issue today, seems like all my apps are now only showing logs from yesterday, mostly up to around 2021-04-19T20:17:34.785Z

Could it be related to a log-caching app that I wrote and deployed a few days ago? I followed the retry and backoff logic from flyctl, so I had hoped it wouldn’t cause any undue effects to your logging infra, but if it is, please let me know, happy to rewrite it.

If it is related to that, I’ve suspended it in the meantime, but since we’re not able to see the latest logs, it’s affecting our ability to debug, so could I ask that you unblock realtime logs capability back for now (assuming this is an automated block due to polling the logging infra too frequently?)

Otherwise, all other indicators seem to indicate stuff is running, we just can’t see the latest logs from it.

Thanks

Just tried running fly logs and logs have now caught up back to present. Strange.

Was this a temporary thing? Or can you confirm if it was caused by the log-caching app? Do you set some sort of global call limit to the https://api.fly.io/api/v1/apps/<app-name>/logs?next_token="" api on a rolling window basis? If so, could you give me some guidance on roughly what that limit is, so I can make sure I don’t go over it?

Thanks Fly team!

Hi!

I was just about to reply to you. We noticed after your post that our log shipper, vector, had stopped shipping (but strangely was still running, so monitoring hadn’t picked it up). This wouldn’t have been caused by anything you had deployed so no worries there! In addition we’re going to try to get some better metrics on our log shipping pipeline to detect this failure in the future.

Thanks for bringing it to our attention!

1 Like

Thanks for letting me know @steve.

Good luck with the troubleshooting, and yeah, better monitoring sounds like a good solution in any case :+1: