I’ve got an app that connects to an API and then has event handlers that wait for events and then run. So I cannot run the app with more than one instance because those events are then duplicated. It’s basically a similar setup to the one described in the cron article.
I noticed as of the 26th, my app has been restarting continuously and typically with a delay between restarts (region lhr, backup ams - 256mb shared cpu app). For instance, the most recent had a gap of 20 minutes where no instance was running and I had no alerts/errors/warnings, I just happened to notice it.
This is a horrible experience and I would have never known that this was happening without manually monitoring the apps page. Here’s an overview showing the last 7 days, the consistent memory and CPU usage, and the vast number of instance changes happening in between. Logs also don’t show any errors apart from a signal to shutdown from the runner, which I didn’t trigger:
I was thinking of perhaps upgrading to v2 since the article mentions that it helps with reliability but the article also pushes to have multiple instances because the underlying architecture has changed from how scaling works on v1. This seems like it would break my setup and potentially result in more downtime because fly will not try ensure that I always have at least one instance running.
Questions:
- Has anyone experienced the same constant restarting even though limits aren’t being reached?
- Are there any recommendations on how to improve reliability while still keeping the requirement of having a single instance running at any point in time?