FRA down? Deploy is looping over "Starting instance"

It seems the deployment is slow/not working at the moment.

$ fly deploy
==> Verifying app config
--> Verified app config
==> Building image
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Remote builder fly-builder-misty-dawn-8677 ready
==> Creating build context
--> Creating build context done
==> Building image with Docker
--> docker host: 20.10.12 linux x86_64
[+] Building 0.2s (0/1)                                  
[+] Building 74.0s (15/15) FINISHED                      
 => [internal] load remote build context            0.0s
 => copy /context /                                 0.1s
 => [internal] load metadata for docker.io/library  0.7s
 => [builder 1/8] FROM docker.io/library/debian:bu  1.4s
 => => resolve docker.io/library/debian:bullseye-s  0.0s
 => => sha256:171530d298096f0697da 1.85kB / 1.85kB  0.0s
 => => sha256:05a2b3b06937c1d0b6ab6851 529B / 529B  0.0s
 => => sha256:dd94cb6119372edb0c0b 1.46kB / 1.46kB  0.0s
 => => sha256:3f4ca61aafcd4fc072 31.40MB / 31.40MB  0.3s
 => => extracting sha256:3f4ca61aafcd4fc07267a1050  1.0s
 => [builder 2/8] RUN apt-get update; apt install   8.2s
 => [builder 3/8] RUN curl https://get.volta.sh |   2.7s 
 => [builder 4/8] RUN volta install node@18.12.1    2.3s 
 => [builder 5/8] RUN mkdir /app                    0.3s 
 => [builder 6/8] WORKDIR /app                      0.0s 
 => [builder 7/8] COPY . .                          0.1s 
 => [builder 8/8] RUN npm install --production=fa  46.9s 
 => [stage-1 2/4] COPY --from=builder /root/.volta  0.8s 
 => [stage-1 3/4] COPY --from=builder /app /app     2.8s 
 => [stage-1 4/4] WORKDIR /app                      0.0s 
 => exporting to image                              4.0s 
 => => exporting layers                             4.0s 
 => => writing image sha256:fb603754bdd23a266dde97  0.0s 
 => => naming to registry.fly.io/xxxx  0.0s
--> Building image done
==> Pushing image to fly
The push refers to repository [registry.fly.io/xxxx]
5f70bf18a086: Layer already exists 
09479f3ecdcd: Pushed 
3d8b5410c9c1: Pushed 
8a70d251b653: Pushed 
deployment-01GN7QVWNTTE574226G2QK21Z1: digest: sha256:49b3d8a60cba70eb13496b29390610ae28158cc21d23f43e6ba95751fa1bb5ea size: 1159
--> Pushing image done
image: registry.fly.io/xxxx:deployment-01GN7QVWNTTE574226G2QK21Z1
image size: 604 MB
==> Creating release
--> release v28 created

--> You can detach the terminal anytime without stopping the deployment
==> Release command detected: npx prisma migrate deploy

--> This release will not be available until the release command succeeds.
         Starting instance

This was stuck for a while, so I decided to kill it and try again.

$ fly deploy
==> Verifying app config
--> Verified app config
==> Building image
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Waiting for remote builder fly-builder-misty-dawn-8677...
Remote builder fly-builder-misty-dawn-8677 ready
==> Creating build context
--> Creating build context done
==> Building image with Docker
--> docker host: 20.10.12 linux x86_64
[+] Building 0.2s (0/1)                                  
[+] Building 0.7s (15/15) FINISHED                       
 => CACHED [internal] load remote build context     0.0s
 => CACHED copy /context /                          0.0s
 => [internal] load metadata for docker.io/library  0.6s
 => [stage-1 1/4] FROM docker.io/library/debian:bu  0.0s
 => CACHED [builder 2/8] RUN apt-get update; apt i  0.0s
 => CACHED [builder 3/8] RUN curl https://get.volt  0.0s
 => CACHED [builder 4/8] RUN volta install node@18  0.0s
 => CACHED [builder 5/8] RUN mkdir /app             0.0s
 => CACHED [builder 6/8] WORKDIR /app               0.0s
 => CACHED [builder 7/8] COPY . .                   0.0s
 => CACHED [builder 8/8] RUN npm install --product  0.0s
 => CACHED [stage-1 2/4] COPY --from=builder /root  0.0s
 => CACHED [stage-1 3/4] COPY --from=builder /app   0.0s
 => CACHED [stage-1 4/4] WORKDIR /app               0.0s
 => exporting to image                              0.0s
 => => exporting layers                             0.0s
 => => writing image sha256:fb603754bdd23a266dde97  0.0s
 => => naming to registry.fly.io/xxxx  0.0s
--> Building image done
==> Pushing image to fly
The push refers to repository [registry.fly.io/xxxx]
5f70bf18a086: Layer already exists 
09479f3ecdcd: Layer already exists 
3d8b5410c9c1: Layer already exists 
8a70d251b653: Layer already exists 
deployment-01GN7R7G38ZNAHGWHVW0SJYW7V: digest: sha256:49b3d8a60cba70eb13496b29390610ae28158cc21d23f43e6ba95751fa1bb5ea size: 1159
--> Pushing image done
image: registry.fly.io/xxxx:deployment-01GN7R7G38ZNAHGWHVW0SJYW7V
image size: 604 MB
==> Creating release
--> release v28 created

--> You can detach the terminal anytime without stopping the deployment
==> Release command detected: npx prisma migrate deploy

--> This release will not be available until the release command succeeds.
         Starting instance
         Starting instance
         Starting instance
         Starting instance
         Starting instance
Running release task (pending)... 🌎

This is still stuck, but a bit further… And “Starting instance” are added periodically.

Here is my Dockerfile

FROM debian:bullseye-slim as builder

ARG NODE_VERSION=18.12.1

RUN apt-get update; apt install -y curl
RUN curl https://get.volta.sh | bash
ENV VOLTA_HOME /root/.volta
ENV PATH /root/.volta/bin:$PATH
RUN volta install node@${NODE_VERSION}

#######################################################################

RUN mkdir /app
WORKDIR /app

# NPM will not install any package listed in "devDependencies" when NODE_ENV is set to "production",
# to install all modules: "npm install --production=false".
# Ref: https://docs.npmjs.com/cli/v9/commands/npm-install#description

ENV NODE_ENV production

COPY . .

RUN npm install --production=false && npm run build
FROM debian:bullseye-slim

LABEL fly_launch_runtime="nodejs"

COPY --from=builder /root/.volta /root/.volta
COPY --from=builder /app /app

WORKDIR /app
ENV NODE_ENV production
ENV PATH /root/.volta/bin:$PATH

CMD [ "npm", "run", "start" ]

In the Dashboard > Monitoring section, I don’t event see an instance being booted (as usually it is).

3 Likes

Now it seems I have ghost instance trying to run:

2022-12-26T18:02:36.327 app[c7ef3413] fra [info] Starting clean up.
2022-12-26T18:02:56.509 runner[cd794290] fra [info] Starting instance

In Fly Metrics, the instance doesn’t even exist O_o

3 Likes

I’m still experiencing slow deployment. It is frequently looping on Starting instance.. and getting stuck there.

After 5 minutes I decide to kill the console and restart a fly deploy. Then suddenly it goes further. But I always have to try two times (at least). I don’t think this is expected.

I’m experiencing the same for two separate deployments in FRA. Switching region fixes it.

@mkern, by switching you mean “pick another region” or switch to another region and coming back to FRA?

the former, unfortunately

I’m seeing the same thing in FRA. It gets stuck here:

--> release v15 created

--> You can detach the terminal anytime without stopping the deployment
==> Release command detected: bin/rails fly:release

--> This release will not be available until the release command succeeds.
	 Starting instance
Running release task (pending)... 🌎

Since yesterday, I’ve been completely unable to deploy a multi-process Rails app (Rails + Sidekiq). The Sidekiq process never passes health checks, causing the deploy to revert (after quite a while). If I deploy without the Sidekiq process, I can deploy just the Rails app but it still takes 2-3 attempts.

After 20 minutes, it actually continues. However, the health check for the Sidekiq process never succeeds. It remains in a pending state and the only output is:

2022-12-27T13:16:56.111 runner[54f94d7e] fra [info] Starting instance

I see the same :frowning:

@kurt @jerome could someone have a look at the FRA infrastructure health? it seems I’m not the only one impacted :slight_smile: Sorry for the ping but you’re the two names from Fly I know :wink:

@binajmen @mathiasn can you share IDs for slow to start / failing instances?

I see things might be a bit slow in FRA, but appear to work.

I’m looking into it a bit more.

Hi,
Sorry I am on phone at the moment. That’s what I could quickly get from GitHub actions

v494 is being deployed

6603eafd: fra pending

(4h ago)

Hi folks, our fra region was experiencing high load earlier today resulting in slower deployments (sometimes 10-15 minutes or longer). The fly deploy command can timeout in those cases. When that happens the deployment will still happen, and once it does fly status will show the instances as running.

Starting today the fra region requires a paid plan (Launch, Scale, Enterprise, or a legacy paid plan). Existing apps and allocs running in fra are not affected. The limitation affects new apps, volumes, and changes to scale. We’re working quickly to expand capacity so we can open it back up.

2 Likes