I have an Elixir app that I am developing and have a single app instance deployment using the default canary strategy.
I’m observing the following order of operations after running fly deploy:
Health checks pass for new instance
Old instance is shut down
2022-01-21T19:55:06.720 app[5b9740cd] ewr [info]Sending signal SIGTERM to main child process w/ PID 510
2022-01-21T19:55:06.720 app[5b9740cd] ewr [info]19:55:06.720 [notice] SIGTERM received - shutting down
Connections continue routing to the old instance for a time
Yes that’s almost definitely the same issue. We are so close to having this fixed, so the answer might be “just bear with us”. But the workaround answer is that running 3 nodes will likely mask the problem.
Was running the following to check the deployment.
Running my application with a scale count of 3.
Also running my application behind Cloudflare.
while true; do curl -s -o /dev/null -w "%{http_code}\n" https://karambit.ai; sleep 1; done
Previously, I would get tons of 525 errors as Cloudflare would be unable to reach the backend due to the service discovery issue and take some time to get healthy.
This time around, I got consistent 200s as I did my canary deploy as I would expect.
Looks fantastic, thanks for the update @kurt and kudos to the team for the fix
Massive thanks, I’ve been having these deployment downtime issues even with 2 instances and bluegreen strategy to mitigate it a bit. It was a bit of a pain.
Today I didn’t have any downtime while deploying I also reverted our app’s deployment strategy to canary as it works equally as well.