Docker Container Management and Persistence After Internal Errors

We’ve run into an interesting issue with our Docker-based proxy apps, where an internal Fly error will cause the container to restart. Everything goes back to normal without issue, except the last script run in our Dockerfile. Since we have to create Elastic indices during build time (based on hostname), and wait for the VPN connection to be active, our Filebeat supervisor program is started after indexing via a script run during build time. Unfortunately, Filebeat isn’t starting back up as expected after these restarts, and I’m not entirely sure why just yet.

For clarity, this is when the script is run:

# setup filebeat
CMD /usr/bin/supervisord -c /etc/supervisor/supervisord.conf \
    && RUN /root/filebeat_setup.sh

My questions:

  1. Is there any information to help us completely understand how this differs from a new build when these restarts happen? (e.g. It seems the evironment is reset to the last build, the hostname persists, but the index/filebeat script from the Dockerfile isn’t executed)

  2. Is there a way we to persist the hostname or have a container-only hostname (specifically to avoid having to create new indices each time)?

  3. Is there a way to add persistence to the current environment, or avoid the container restarts?

  4. Is there a good way to monitor these restarts, so we can verify our Filebeat connection?

We of course just want to make sure everything persists in any situation, and while we can work around the current issue as-is, it’s becoming a bit too messy for my liking. Thanks in advance.

Hi @dbrown

It looks like you’re using supervisord which can restart processes when they crash/exit. Have you confirmed that it was the container restarting and not supervisord restarting the process?

As for persistence, you can use fly volumes (Volumes · Fly Docs). These are volumes that survive restarts, delete/recreate vm, etc.

If you want to see logs for your app you can use flyctl logs · Fly Docs, or if you want to see history about a specific vm you can use flyctl vm status · Fly Docs.

Thanks for the reply @charsleysa

Yes, we currently use supervisord to auto-restart all processes, but the problem is that the Filebeat process isn’t starting yet, since it has to start from that script. While it is currently difficult to see the logs, since we overrun the buffer with nginx logs logged to stdout (can remove once we fix filebeat reliability), we can definitely see it is a restart since the uptime has reset. Unfortunately, we cannot use supervisord to start Filebeat on startup, since it has to run the index script before being started, and we of course don’t want that script run on process auto-restarts.

Volumes may make things easier, but I’m not entirely sure. :thinking: We really don’t need to persist the data more than we need to avoid having to recreate indices. Does the random hostname persist with these volumes?

I do think the biggest benefit would be removing the need for the initial indexing, but I’m not entirely sure how we can accomplish this, since we have to update with each hostname change, and initiating on every restart will cause Filebeat to break in certain situations. Even so, I’m open to any suggestions.

I’ve decided to look into ways to consolidate our Filebeat setup all-together, so we can avoid this issue. I’m still not entirely sure why the script is failing on the restart, but I’ll add some verbose logging to the script this week so we can at least figure out why its only failing during restarts. Cheers!