Random errors when updating a machine

For the past couple of days I’ve been geeting random errors when deploying and updating machines.

It seems random because if I retry to deploy again, often it works.

This just happened now:

-------
Updating existing machines in '...' with rolling strategy

-------
 ✖ [1/4] Machine 328744d4ae4585 [app] update failed: failed to update VM 328744d4ae4585: request returned non-2xx status, 502
   [2/4] Waiting for job
   [3/4] Waiting for job
   [4/4] Waiting for job
-------
WARN failed to release lease for machine 9e784902f52083: lease not found

WARN failed to release lease: lease not found

Error: failed to update VM 328744d4ae4585: request returned non-2xx status, 502 (Request ID: 01HG19P45EH3NXW9HQ8FV3QM0J-qro)

This is a very small Node app with this docker config:

FROM node:18-alpine3.15

USER root

WORKDIR /usr/src/app

COPY package.json .
COPY package-lock.json .
RUN npm i --production

COPY . .

ENV NODE_ENV production
ENV PORT 8080
ENV HOST 0.0.0.0

CMD ["node", "index.js"]

Is there an on-going issue with machines I’m not aware of?

Edit:

For example. Just retried and now only 2 machines were updated:

-------
Updating existing machines in '...' with rolling strategy

-------
 ✔ [1/4] Machine 328744d4ae4585 [app] update succeeded
 ✔ [2/4] Machine 32874599c32085 [app] update succeeded
 ✖ [3/4] Machine 91857e4c441498 [app] update failed: failed to update VM 91857e4c441498: request returned non-2xx status, 502
 ✖ [4/4] Machine 9e784902f52083 [app] update canceled
-------

Do these seem correlated with slow deploys or evidence of network congestion, by any chance?

(lease not found in the machine-update context was ascribed to timeouts in the past—although that may not be valid anymore.)

Could be. I have no idea.

I hadn’t deployed to Fly in weeks and now it’s happening almost every time I try to deploy.

Thanks for the link to the other thread.

Same here, i think its related to this problem in the plataform

Fly.io Status (flyio.net)

Nov 24, 2023

Elevated registry errors

Resolved - This incident has been resolved.
Nov 24, 15:17 CST

Monitoring - We have implemented a fix and are monitoring the results. Image pushes to the primary registry are succeeding again and deploys should work as expected.
Nov 24, 12:42 CST

Update - While we investigate and fix the issue with our primary registry, users can temporarily deploy to our alternate registry using the
FLY_REGISTRY_HOST=registry2.fly.io fly deploy
command.
Nov 24, 12:11 CST

Investigating - We are investigating elevated rates of 500 errors when pushing images to our Docker registry.
Nov 24, 12:05 CST

2 Likes

You might be right.

Just deployed and it all went smoothly as expected.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.