Redeploying an image doesn't have any effect

So I’ve noticed that when I redeploy a prebuilt image, just basically redeploying and nothing changed in the code, the fly app doesn’t show any signals that it restarted (from the log), and usually, I notice some odd problems, for instance, the request took much longer to come and be served by the application. It does create a new release though.

To solve this problem, I have to perform a complete redeployment, basically building a new docker image and deploying that image.

More info: my app is a rails application, bookforward-prod-us if that helps.

Observing a bit more, I think it’s some network/routing issue, similar to the one I reported a while back Network issue from Fly?. The symptom is the request is queued somewhere and couldn’t reach the application.

Hmm, another finding, which contradicts what I said above. I did a Rails.cache.clear and everything is back to normal. Don’t really know the connection between the network issue and cache :man_shrugging:

Nope, clearly Rails.cache.clear is not related

Hi, sorry to hear you’re running into this. There’s a couple things we might want to do off the bat to learn more about what’s going on.

  • some general info about your app-- the output of fly status -a appname will help us narrow things down a bit
  • looking at the debug logs for the strange deployment, eg, LOG_LEVEL=debug fly deploy; this will help us figure out what flyctl is instructing our api to do to your app.
  • running a curl against debug.fly.dev and a curl against your app with the flyio: doit header set. This will allow you to compare and contrast the traffic that’s being sent to your app’s instances vs. the debug endpoint.

I hope this helps! Feel free to reach back out with any questions you might have during your investigations.

These are helpful, will do and post the update back

1 Like

awesome, thank you! I’m glad you replied to confirm because I just realized I got the custom header wrong for curl: it’s actually flyio-debug: doit and you can see an example here:

Some initial results,

fly status

App
  Name     = bookforward-prod-us
  Owner    = personal
  Version  = 95
  Status   = running
  Hostname = bookforward-prod-us.fly.dev

Deployment Status
  ID          = b09947fb-4214-782a-6d6f-63d3ba88569d
  Version     = v95
  Status      = successful
  Description = Deployment completed successfully
  Instances   = 1 desired, 1 placed, 1 healthy, 0 unhealthy

Instances
ID              PROCESS VERSION REGION  DESIRED STATUS  HEALTH CHECKS           RESTARTS        CREATED
ce1dc6ee        app     95      iad     run     running 1 total, 1 passing      0               5m35s ago

Debug log when deployment: Debug fly deployment · GitHub. I don’t really see any problems from the log though…

Yes, I agree-- all of this output appears as if everything is running smoothly. Could you send over the logs covering your most recent redeploy (where you’d normally see a reboot?).
With that app metadata, we’ll be able to investigate more thoroughly on our end.

Any additional data you have on the performance problems you’re seeing (say, specific timeframes for slow quieries, for example) could also come in handy.

2022-07-20T12:11:59Z app[75885fe7] iad [info]12:11:59 web.1      | I, [2022-07-20T12:11:59.766340 #3077]  INFO -- : source=rack-timeout id=0636ded7-e08e-4ed6-8d2d-233bfdad4e3e timeout=15000ms service=154ms state=completed
2022-07-20T12:12:00Z app[75885fe7] iad [info]12:12:00 web.1      | I, [2022-07-20T12:12:00.129876 #3077]  INFO -- : [1f68196c-92d7-4512-b220-65d8d3253257] [ahoy] Visit excluded
2022-07-20T12:12:00Z app[75885fe7] iad [info]12:12:00 web.1      | I, [2022-07-20T12:12:00.351461 #3077]  INFO -- : source=rack-timeout id=1f68196c-92d7-4512-b220-65d8d3253257 timeout=15000ms service=226ms state=completed
2022-07-20T12:12:48Z app[75885fe7] iad [info]12:12:48 web.1      | I, [2022-07-20T12:12:48.178734 #3060]  INFO -- : source=rack-timeout id=10945f27-f3c0-40b9-b73e-edce595183da timeout=15000ms state=ready
2022-07-20T12:12:48Z app[75885fe7] iad [info]12:12:48 web.1      | I, [2022-07-20T12:12:48.185302 #3060]  INFO -- : [10945f27-f3c0-40b9-b73e-edce595183da] [ahoy] Visit excluded
2022-07-20T12:12:48Z app[75885fe7] iad [info]12:12:48 web.1      | I, [2022-07-20T12:12:48.187293 #3060]  INFO -- : [10945f27-f3c0-40b9-b73e-edce595183da] method=GET path=/books format=html controller=BooksController action=index status=302 duration=2.61 view=0.00 db=0.00 location=https://www.thue.do params={"header"=>"Tác+phẩm+trong+nước", "page"=>"2", "q"=>{"categories_id_in"=>"3"}} user_id=
2022-07-20T12:12:48Z app[75885fe7] iad [info]12:12:48 web.1      | I, [2022-07-20T12:12:48.189490 #3060]  INFO -- : source=rack-timeout id=10945f27-f3c0-40b9-b73e-edce595183da timeout=15000ms service=11ms state=completed
2022-07-20T12:18:43Z runner[ce1dc6ee] iad [info]Starting instance
2022-07-20T12:18:46Z runner[ce1dc6ee] iad [info]Configuring virtual machine
2022-07-20T12:18:46Z runner[ce1dc6ee] iad [info]Pulling container image
2022-07-20T12:18:50Z runner[ce1dc6ee] iad [info]Unpacking image
2022-07-20T12:19:07Z runner[ce1dc6ee] iad [info]Configuring firecracker
2022-07-20T12:19:07Z runner[ce1dc6ee] iad [info]Starting virtual machine
2022-07-20T12:19:07Z app[ce1dc6ee] iad [info]Starting init (commit: c86b3dc)...
2022-07-20T12:19:08Z app[ce1dc6ee] iad [info]Preparing to run: `/bin/bash -c ${SERVER_COMMAND}` as root
2022-07-20T12:19:08Z app[ce1dc6ee] iad [info]2022/07/20 12:19:08 listening on [fdaa:0:402d:a7b:9d34:ce1d:c6ee:2]:22 (DNS: [fdaa::3]:53)
2022-07-20T12:19:08Z app[ce1dc6ee] iad [info]12:19:08 web.1      | started with pid 521
2022-07-20T12:19:08Z app[ce1dc6ee] iad [info]12:19:08 exporter.1 | started with pid 522
2022-07-20T12:19:11Z app[ce1dc6ee] iad [info]12:19:11 web.1      | [521] Puma starting in cluster mode...
2022-07-20T12:19:11Z app[ce1dc6ee] iad [info]12:19:11 web.1      | [521] * Puma version: 5.6.4 (ruby 3.1.1-p18) ("Birdie's Version")
2022-07-20T12:19:11Z app[ce1dc6ee] iad [info]12:19:11 web.1      | [521] *  Min threads: 5
2022-07-20T12:19:11Z app[ce1dc6ee] iad [info]12:19:11 web.1      | [521] *  Max threads: 5
2022-07-20T12:19:11Z app[ce1dc6ee] iad [info]12:19:11 web.1      | [521] *  Environment: production
2022-07-20T12:19:11Z app[ce1dc6ee] iad [info]12:19:11 web.1      | [521] *   Master PID: 521
2022-07-20T12:19:11Z app[ce1dc6ee] iad [info]12:19:11 web.1      | [521] *      Workers: 2
2022-07-20T12:19:11Z app[ce1dc6ee] iad [info]12:19:11 web.1      | [521] *     Restarts: (✔) hot (✖) phased
2022-07-20T12:19:11Z app[ce1dc6ee] iad [info]12:19:11 web.1      | [521] * Preloading application
2022-07-20T12:19:15Z app[ce1dc6ee] iad [info]12:19:15 exporter.1 | 2022-07-20 12:19:12 +0000 Starting prometheus exporter on 0.0.0.0:9394
2022-07-20T12:19:18Z app[ce1dc6ee] iad [info]12:19:18 web.1      | You can remove `require 'dalli/cas/client'` as this code has been rolled into the standard 'dalli/client'.
2022-07-20T12:19:18Z app[ce1dc6ee] iad [info]12:19:18 web.1      | I, [2022-07-20T12:19:16.927613 #521]  INFO -- : Dalli/SASL authenticating as memcached-app151343598
2022-07-20T12:19:18Z app[ce1dc6ee] iad [info]12:19:18 web.1      | I, [2022-07-20T12:19:16.929755 #521]  INFO -- : Dalli/SASL: Authenticated
2022-07-20T12:19:18Z app[ce1dc6ee] iad [info]12:19:18 web.1      | [521] * Listening on http://0.0.0.0:8080
2022-07-20T12:19:18Z app[ce1dc6ee] iad [info]12:19:18 web.1      | [521] ! WARNING: Detected 3 Thread(s) started in app boot:
2022-07-20T12:19:18Z app[ce1dc6ee] iad [info]12:19:18 web.1      | [521] ! #<Thread:0x00007f0a83890e10 /app/vendor/bundle/ruby/3.1.0/gems/scout_apm-5.2.0/lib/scout_apm/agent.rb:176 sleep> - /app/vendor/bundle/ruby/3.1.0/gems/scout_apm-5.2.0/lib/scout_apm/background_worker.rb:51:in `sleep'
2022-07-20T12:19:18Z app[ce1dc6ee] iad [info]12:19:18 web.1      | [521] ! #<Thread:0x00007f0a83890a50 /app/vendor/bundle/ruby/3.1.0/gems/scout_apm-5.2.0/lib/scout_apm/agent.rb:210 sleep> - /app/vendor/bundle/ruby/3.1.0/gems/scout_apm-5.2.0/lib/scout_apm/background_worker.rb:51:in `sleep'
2022-07-20T12:19:18Z app[ce1dc6ee] iad [info]12:19:18 web.1      | [521] ! #<Thread:0x00007f0a80beb8b0 /app/vendor/bundle/ruby/3.1.0/gems/timeout-0.3.0/lib/timeout.rb:101 sleep_forever> - /app/vendor/bundle/ruby/3.1.0/gems/timeout-0.3.0/lib/timeout.rb:105:in `pop'
2022-07-20T12:19:18Z app[ce1dc6ee] iad [info]12:19:18 web.1      | [521] Use Ctrl-C to stop
2022-07-20T12:19:18Z app[ce1dc6ee] iad [info]12:19:18 web.1      | I, [2022-07-20T12:19:18.660927 #538]  INFO -- : Installing Puma worker loop.
2022-07-20T12:19:18Z app[ce1dc6ee] iad [info]12:19:18 web.1      | I, [2022-07-20T12:19:18.657912 #537]  INFO -- : Installing Puma worker loop.
2022-07-20T12:19:18Z app[ce1dc6ee] iad [info]12:19:18 web.1      | [521] - Worker 0 (PID: 537) booted in 0.01s, phase: 0
2022-07-20T12:19:18Z app[ce1dc6ee] iad [info]12:19:18 web.1      | [521] - Worker 1 (PID: 538) booted in 0.01s, phase: 0
2022-07-20T12:19:56Z runner[75885fe7] iad [info]Shutting down virtual machine
2022-07-20T12:19:56Z app[75885fe7] iad [info]Sending signal SIGINT to main child process w/ PID 516
2022-07-20T12:19:57Z app[75885fe7] iad [info]12:19:57 system     | sending SIGTERM to all processes
2022-07-20T12:19:57Z app[75885fe7] iad [info]12:19:57 web.1      | [521] === puma shutdown: 2022-07-20 12:19:57 +0000 ===
2022-07-20T12:19:57Z app[75885fe7] iad [info]12:19:57 exporter.1 | terminated by SIGTERM
2022-07-20T12:20:05Z app[ce1dc6ee] iad [info]12:20:05 web.1      | I, [2022-07-20T12:20:05.933133 #537]  INFO -- : source=rack-timeout id=67d12c53-dc2c-4ec2-b67c-510aa68b9673 timeout=15000ms state=ready
2022-07-20T12:20:05Z app[ce1dc6ee] iad [info]12:20:05 web.1      | I, [2022-07-20T12:20:05.933360 #537]  INFO -- : source=rack-timeout id=1f93877a-15f6-40c4-916d-c384c8f46ba6 timeout=15000ms state=ready
2022-07-20T12:20:07Z app[ce1dc6ee] iad [info]12:20:07 web.1      | I, [2022-07-20T12:20:07.557059 #537]  INFO -- : [67d12c53-dc2c-4ec2-b67c-510aa68b9673] [ahoy] Visit excluded
2022-07-20T12:20:07Z app[ce1dc6ee] iad [info]12:20:07 web.1      | I, [2022-07-20T12:20:07.734664 #537]  INFO -- : [67d12c53-dc2c-4ec2-b67c-510aa68b9673] Dalli/SASL authenticating as memcached-app151343598
2022-07-20T12:20:07Z app[ce1dc6ee] iad [info]12:20:07 web.1      | I, [2022-07-20T12:20:07.739498 #537]  INFO -- : [67d12c53-dc2c-4ec2-b67c-510aa68b9673] Dalli/SASL: Authenticated
2022-07-20T12:20:07Z app[ce1dc6ee] iad [info]12:20:07 web.1      | I, [2022-07-20T12:20:07.749976 #537]  INFO -- : [67d12c53-dc2c-4ec2-b67c-510aa68b9673] Fork detected, re-connecting child process...
2022-07-20T12:20:07Z app[ce1dc6ee] iad [info]12:20:07 web.1      | I, [2022-07-20T12:20:07.753334 #537]  INFO -- : [67d12c53-dc2c-4ec2-b67c-510aa68b9673] Dalli/SASL authenticating as memcached-app151343598
2022-07-20T12:20:07Z app[ce1dc6ee] iad [info]12:20:07 web.1      | I, [2022-07-20T12:20:07.761580 #537]  INFO -- : [67d12c53-dc2c-4ec2-b67c-510aa68b9673] Dalli/SASL: Authenticated
2022-07-20T12:20:08Z app[ce1dc6ee] iad [info]12:20:08 web.1      | I, [2022-07-20T12:20:08.585196 #537]  INFO -- : [67d12c53-dc2c-4ec2-b67c-510aa68b9673] method=POST path=/api/v1/graphql format=*/* controller=Api::V1::GraphqlController action=execute status=200 duration=2640.53 view=12.31 db

This is a normal log when the app restarts. And when I redeploy using the same image, these logs wouldn’t come.

Currently I’m not facing the network problem, will debug it when I face it again. Thank you for supporting

Edit: I’m using blue-green deployment, if that helps at all

Thank you for confirming! I think I might understand at least part of the issue.

Taking another look, I think the behavior that you’re describing is expected. If you redeploy with the exact same image and config, nothing should happen. If you’re redeploying with the same image, but making changes to the services, the services should update, but the instance shouldn’t reboot.

I’m likewise glad to hear you’re not currently having the intermittent network issue-- feel free to reach out here if you run into it again!

Isn’t it a typical expectation though? I deploy my app, no matter what version, it should stop the old process and start a new one. If a deployment doesn’t help me with the above goal, I can only restart the app, which will result in downtime, is that right?

1 Like

Oh one more thing, I have another sidekiq app that runs on the same image as the web server app, and it does start a new instance upon deployment, even with the same image, every single time. So the behavior is not consistent. This is the log when redeploying the same docker image:

2022-07-21T00:45:28Z app[2753d8bd] iad [info]2022-07-21T00:45:28.941Z pid=516 tid=3fqc class=ActionMailer::MailDeliveryJob jid=acac331c7b4617d981f12a01 elapsed=0.048 INFO: done
2022-07-21T00:52:06Z runner[17d77156] iad [info]Starting instance
2022-07-21T00:52:10Z runner[17d77156] iad [info]Configuring virtual machine
2022-07-21T00:52:10Z runner[17d77156] iad [info]Pulling container image
2022-07-21T00:52:16Z runner[17d77156] iad [info]Unpacking image
2022-07-21T00:52:34Z runner[17d77156] iad [info]Configuring firecracker
2022-07-21T00:52:34Z runner[17d77156] iad [info]Starting virtual machine
2022-07-21T00:52:34Z app[17d77156] iad [info]Setting up swapspace version 1, size = 512 MiB (536866816 bytes)
2022-07-21T00:52:34Z app[17d77156] iad [info]no label, UUID=7a39c6d0-f3af-4376-9507-d8216b202571
2022-07-21T00:52:34Z app[17d77156] iad [info]Preparing to run: `bundle exec rails db:migrate` as root
2022-07-21T00:52:34Z app[17d77156] iad [info]2022/07/21 00:52:34 listening on [fdaa:0:402d:a7b:9d35:17d7:7156:2]:22 (DNS: [fdaa::3]:53)
2022-07-21T00:52:41Z app[17d77156] iad [info]You can remove `require 'dalli/cas/client'` as this code has been rolled into the standard 'dalli/client'.
2022-07-21T00:52:42Z app[17d77156] iad [info]I, [2022-07-21T00:52:42.470806 #518]  INFO -- : Dalli/SASL authenticating as memcached-app151343598
2022-07-21T00:52:42Z app[17d77156] iad [info]I, [2022-07-21T00:52:42.473493 #518]  INFO -- : Dalli/SASL: Authenticated
2022-07-21T00:52:43Z app[17d77156] iad [info]Main child exited normally with code: 0
2022-07-21T00:52:43Z app[17d77156] iad [info]Starting clean up.

fly status:

➜  bookforward git:(master) fly status -c fly.sidekiq.prod.us.toml
Update available 0.0.338 -> v0.0.356.
Run "fly version update" to upgrade.
App
  Name     = bookforward-sidekiq
  Owner    = personal
  Version  = 80
  Status   = running
  Hostname = bookforward-sidekiq.fly.dev

Deployment Status
  ID          = 12b314ae-46a3-9cdb-ebc7-669d5e070f30
  Version     = v80
  Status      = successful
  Description = Deployment completed successfully
  Instances   = 1 desired, 1 placed, 1 healthy, 0 unhealthy

Instances
ID              PROCESS VERSION REGION  DESIRED STATUS  HEALTH CHECKS   RESTARTS        CREATED
2753d8bd        app     80      iad     run     running                 0               12h37m ago

If a deployment doesn’t help me with the above goal, I can only restart the app, which will result in downtime, is that right?

I think that fly restart might do what you want in that case:

I’m not yet sure what’s different about the deployment behavior with bookforward-sidekiq though