More unhealthy allocations

I just deploying some small changes to my server and it wouldn’t come back up. This is just a single instance and it went like this:

  1. I deploy changes, fly deploy -c fly-production.toml
  2. server fails because of unhealthy allocations
  3. fly tries to rollback to my prior version, that also fails
  4. I try to just redeploy fly deploy -c fly-production.toml again
  5. server starts up and seems fine

I don’t have a services section in my toml file because this is just an internal service (gunicorn) proxied to by my other fly app that is running a nginx proxy.

I don’t have any health checks setup because I don’t have anything exposed, should I still have these setup ?

I’m using docker and it seems that the logs aren’t really retrievable when a server fails to start. Should I still be able to access the failed logs at some point in the steps above?

I also have a staging setup on fly to try to mirror production and I deployed to that first and it seemed to work fine the first time.

Any idea what could be going wrong? Thanks.

I had a custom entrypoint script that I wrote in Python that wasn’t performing an exec of the last command. I figure that can’t be good so I’ve updated that. I saw another thread about processes accessing the volume and breaking a restart and maybe that is related to the lack of exec?