I have a question (or two) about rolling deployments.
Imagine an application with 2 running/healthy instances:
2021-01-20T06:15:22.518Z eac02266 sjc [info] [2021-01-20T06:15:21.461Z] "GET / HTTP/1.1" 200 2021-01-20T06:15:24.127Z 6997dc52 sjc [info] [2021-01-20T06:15:24.067Z] "GET /ready HTTP/1.1"
I manually deploy a new version, and I see a new VM come online.
2021-01-20T06:15:26.229Z a2ab8699 sjc [info] Starting instance 2021-01-20T06:15:26.253Z a2ab8699 sjc [info] Configuring virtual machine 2021-01-20T06:15:26.254Z a2ab8699 sjc [info] Pulling container image 2021-01-20T06:15:28.584Z a2ab8699 sjc [info] Unpacking image ... 2021-01-20T06:15:29.929Z a2ab8699 sjc [info] [runner] starting
and I see the 2 healtchecks pass for this new VM:
2021-01-20T06:15:34.539Z a2ab8699 sjc [info] [2021-01-20T06:15:34.529Z] "GET /ready HTTP/1.1" 200 - 0 2 0 - "126.96.36.199" "Consul Health Check" "cec8c8af-99e9-4da1-8cd2-b2c99c6515e7" "172.19.2.130:9903" "-" 2021-01-20T06:15:44.543Z a2ab8699 sjc [info] [2021-01-20T06:15:44.530Z] "GET /ready HTTP/1.1" 200 - 0 2 0 - "188.8.131.52" "Consul Health Check" "97d0e784-0c1c-4978-9cef-7a46ea1c7812" "172.19.2.130:9903" "-"
the first timestamp is the timestamp from fly logs while the second one between brackets is Envoy’s access log timestamp
As soon as 2 Health Checks pass but without any Healthcheck status changed to passing log message, I see my two original VMs get terminated.
2021-01-20T06:15:44.965Z eac02266 sjc [info] Shutting down virtual machine 2021-01-20T06:15:45.089Z 6997dc52 sjc [info] Shutting down virtual machine
I’ve added a kill_timeout of 120 seconds, so the VM should stick around for a bit, but there doesn’t seem to be any logs after the SIGINT is sent.
A new VM starts (the 2nd one):
2021-01-20T06:15:56.620Z 74d94cfa sjc [info] Starting instance 2021-01-20T06:15:56.640Z 74d94cfa sjc [info] Configuring virtual machine 2021-01-20T06:15:56.641Z 74d94cfa sjc [info] Pulling container
but only one healthcheck is logged:
2021-01-20T06:16:08.808Z 74d94cfa sjc [info] [2021-01-20T06:16:08.799Z] "GET /ready HTTP/1.1" 200 - 0 2 0 - "184.108.40.206" "Consul Health Check" "433ba8df-6888-4477-ae34-ed4c0522385c" "172.19.3.74:9903" "-" 2021-01-20T06:16:12.972Z 74d94cfa sjc [info] Health check status changed to 'passing'
- Is it normal that after only one new VMs comes online, the previous 2 are scheduled for termination?
- Is it normal to not have any log messages after SIGINT for the running VMs? I catch SIGINT and then gracefully drain http listeners, but I’m not seeing any log messages for this
- Would the behavior be any different if I had 3 running VMs?
ps: @kurt I ended up writing a tiny go binary that handles the graceful shutdown because doing it in bash was too painful