That’s generally an auth error. [1] [2] Try running:
fly auth logout
fly auth login
fly auth docker
to see if you see a change.
That’ll re-auth your fly account to the flyctl session, and then re-add your fly registry auth to your local docker daemon. fly deploy --local-only uses your local docker for actually pushing the image
Thanks @Sam-Fly , that explains the first error. I’m still a little lost on why it’s stuck repushing the image over and over again. It says it pushed 26/26GB but then it restarts.
BTW, I’m still getting the app repository not found after reauthing
@Sam-Fly There’s something wrong w/ the Fly auth service. When I run fly auth login and click on the sign in button in the website, it throws a 500 error but on the CLI terminal, it says it successfully logged in.
When I look at my docker/config.json, the auth field is empty.
Thanks, the auth login works now. I’m currently pushing my image to registry.fly.io… I’ll let you know if that works but in the past 2-3 days, none of my large images were able to get pushed.
@injoong My GPU image is failing to push up to the fly registry. I see the layer push up 26GB but then it retries 2x before throwing “app repository not found”
@khuezy I was able to take a look at some of the push errors you were seeing, think I see what’s happening. It’s separate from the auth error you saw today, as Injoong mentioned we pushed a fix for that.
It looks like the failures happening when pushing a very large image layer, causing the actual push to take a long time. I’m seeing some at 8+ minutes. That’s long enough that it’d cause the registry auth token flyctl passes to your local docker daemon to expire. Since the auth token is expired, it throws the “name unknown” error for an unauthed attempt.
It being a local build might be contributing, I’m not sure if a remote depot or fly-hosted builder would be able to re-up the auth token for a really long layer push. fly deploy --remote-only and fly deploy --remote-only --depot=false would be how to test those.
But ultimately, pushing around huge images (and especially huge layers) is going to be brittle. Breaking up your large layers into multiple smaller ones can help. For GPU machines we highly recommend moving any large, data heavy steps (like model downloads) out of the dockerfile.
If it’s something that doesn’t change often, downloading it once and storing it on a volume is going to greatly shrink your image size, and speed up your deploys + builds significantly.
I was able to consistently push via - -local-only in the past. It’s just recently that it’s failing. I want to bake in the models on the image to avoid the volume storage cost.
Just to confirm, is this in your .docker/config.json file? Mine’s empty as well and I can push/pull from the registry, so I believe that’s expected.
I totally hear you about wanting to save on volume costs, and that it used to work with --local-only. I’ll echo what Sam said earlier - pushing large images in general is going to be brittle, and making them smaller or moving them out of the image (if possible) will make for a more reliable experience.
Have you had any success with building remotely using fly deploy --remote-only?