Fly deploy fails when pushing local only

@khuezy I was able to take a look at some of the push errors you were seeing, think I see what’s happening. It’s separate from the auth error you saw today, as Injoong mentioned we pushed a fix for that.

It looks like the failures happening when pushing a very large image layer, causing the actual push to take a long time. I’m seeing some at 8+ minutes. That’s long enough that it’d cause the registry auth token flyctl passes to your local docker daemon to expire. Since the auth token is expired, it throws the “name unknown” error for an unauthed attempt.

It being a local build might be contributing, I’m not sure if a remote depot or fly-hosted builder would be able to re-up the auth token for a really long layer push. fly deploy --remote-only and fly deploy --remote-only --depot=false would be how to test those.

But ultimately, pushing around huge images (and especially huge layers) is going to be brittle. Breaking up your large layers into multiple smaller ones can help. For GPU machines we highly recommend moving any large, data heavy steps (like model downloads) out of the dockerfile.

If it’s something that doesn’t change often, downloading it once and storing it on a volume is going to greatly shrink your image size, and speed up your deploys + builds significantly.

1 Like