Possible docker image cache bug is preventing deploys

Last night I realized that fly was claiming my application was deployed, but the new versions of code never actually ran.

Deploying (and redeploying) the same code as a new app works fine. I am not sure if the old app is corrupt, wedged, or something else is going on.

Factors that could be causing a problem:

  • the app generates a lot of log spew sometimes (and the log output seemed to get out of sync with reality)
  • the app runs for a while in the background after a request

Any ideas about what’s going on? How can I ensure that the version I think is running is actually what’s running across all the different regions?

Actually, I take it back. The new application does not seem to deploy properly either now. Are there any known issues with the docker registry?

Here’s the output I see while deploying: (I’ve edited out most of the digest suffix)

Searching for image 'registry.fly.io/sandbox-cld-adamb-batch-16@sha256:2aec[...]' locally...
Searching for image 'registry.fly.io/sandbox-cld-adamb-batch-16@sha256:2aec[...]' remotely...
image found: img_ylj9x4dz9wgpwo1k
Image: registry.fly.io/sandbox-cld-adamb-batch-16@sha256:9249[...]
Image size: 46 MB

Note that the SHA256 digest fly is “searching for” is different than the SHA256 digest it “found”.

I don’t see a problem with the registry right now.

A different ID being returned is typically normal. Local ids are generated from uncompressed image layers while “distribution ids” in the registry are generated from compressed layers and the manifest. Here’s a good overview. Our resolver code can figure that out and will return the right distribution id that we use to deploy.

That said, I can manually resolve the 2aec sha and get a matching sha back, which I’d expect the deployment to do if it was pushed already.

How are you building, pushing, and deploying?

That output was from flyctl, though most of the deploys and app creations are via the docker registry and graphql API

Can you post or DM the queries and mutations you’re calling to deploy? Are you deploying images in our registry or docker hub? Is there an example of an app that’s running the wrong version?

Replied with more info via DM. I’m using the fly image registry and the graphql operations launchApp, setSecrets, and deployImage