Fly deploy fails when pushing local only

I just started noticing this but I’m not sure when it was introduced.

I have a GPU machine and running fly deploy --local-only … will get stuck on image push.

I’m on the latest flyctl version.

Yesterday, I get this error: error rendering push status stream: unknown: {"errors":[{"code":"NAME_UNKNOWN", ...}]}

Today, it just keeps retrying to push the image (26GB) after what looks like a successful push.

That’s generally an auth error. [1] [2] Try running:

  1. fly auth logout
  2. fly auth login
  3. fly auth docker

to see if you see a change.

That’ll re-auth your fly account to the flyctl session, and then re-add your fly registry auth to your local docker daemon. fly deploy --local-only uses your local docker for actually pushing the image

1 Like

Thanks @Sam-Fly , that explains the first error. I’m still a little lost on why it’s stuck repushing the image over and over again. It says it pushed 26/26GB but then it restarts.

BTW, I’m still getting the app repository not found after reauthing

I tried pushing the image to registry.fly.io and docker push `registry.fly.io/<app>:latest`

but it still errors with the “app repository not found”. I already authed into flyctl and fly docker.

@Sam-Fly There’s something wrong w/ the Fly auth service. When I run fly auth login and click on the sign in button in the website, it throws a 500 error but on the CLI terminal, it says it successfully logged in.

When I look at my docker/config.json, the auth field is empty.

We just deployed a fix for this about an hour ago - are you still hitting issues when running fly auth login?

1 Like

Thanks, the auth login works now. I’m currently pushing my image to registry.fly.io… I’ll let you know if that works but in the past 2-3 days, none of my large images were able to get pushed.

1 Like

@injoong My GPU image is failing to push up to the fly registry. I see the layer push up 26GB but then it retries 2x before throwing “app repository not found”

I’m following this just to be sure I’m not doing anything wrong: Managing Docker Images with Fly.io's Private Registry · Fly Docs

Gotcha - can you try removing your .fly/config.yml (backing it up just to be safe) and re-run fly auth login, then try pushing up the image again?

1 Like

@khuezy I was able to take a look at some of the push errors you were seeing, think I see what’s happening. It’s separate from the auth error you saw today, as Injoong mentioned we pushed a fix for that.

It looks like the failures happening when pushing a very large image layer, causing the actual push to take a long time. I’m seeing some at 8+ minutes. That’s long enough that it’d cause the registry auth token flyctl passes to your local docker daemon to expire. Since the auth token is expired, it throws the “name unknown” error for an unauthed attempt.

It being a local build might be contributing, I’m not sure if a remote depot or fly-hosted builder would be able to re-up the auth token for a really long layer push. fly deploy --remote-only and fly deploy --remote-only --depot=false would be how to test those.

But ultimately, pushing around huge images (and especially huge layers) is going to be brittle. Breaking up your large layers into multiple smaller ones can help. For GPU machines we highly recommend moving any large, data heavy steps (like model downloads) out of the dockerfile.

If it’s something that doesn’t change often, downloading it once and storing it on a volume is going to greatly shrink your image size, and speed up your deploys + builds significantly.

1 Like

I was able to consistently push via - -local-only in the past. It’s just recently that it’s failing. I want to bake in the models on the image to avoid the volume storage cost.

When I log in and auth docker, I see: registry.fly.io: {}
Is that expected? Or is there suppose to be some value in there.

Just to confirm, is this in your .docker/config.json file? Mine’s empty as well and I can push/pull from the registry, so I believe that’s expected.

I totally hear you about wanting to save on volume costs, and that it used to work with --local-only. I’ll echo what Sam said earlier - pushing large images in general is going to be brittle, and making them smaller or moving them out of the image (if possible) will make for a more reliable experience.

Have you had any success with building remotely using fly deploy --remote-only?

Yep. I’m not sure about it being brittle. In the past before the recent registry change, I never had a problem after deploying 100+ times.

When using remote-only, it eventually errors with deadline-exceeded error… with log use of closed network connection

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.