Chromium takes too long to initialize

Hi, I have a Next.js app that uses Puppeteer. It works fine, but on a first request it takes 25 seconds to initialize Chromium. In development it takes less than 1 second.

I’m using a shared-4x CPU, 4 GB RAM.

I have also tried using performance CPUs, but it still takes 25 seconds. How can I improve the startup time?

Here’s my current Dockerfile:

# syntax = docker/dockerfile:1

# Adjust NODE_VERSION as desired
ARG NODE_VERSION=22.11.0
FROM node:${NODE_VERSION}-slim AS base

LABEL fly_launch_runtime="Next.js"

# Next.js app lives here
WORKDIR /app

# Set production environment
ENV NODE_ENV="production"


# Throw-away build stage to reduce size of final image
FROM base AS build

# Install packages needed to build node modules
RUN apt-get update -qq && \
    apt-get install --no-install-recommends -y build-essential node-gyp pkg-config python-is-python3

# Install node modules
COPY package-lock.json package.json ./
RUN npm ci --include=dev

# Copy application code
COPY . .

# Build application
RUN npm run build

# Remove development dependencies
RUN npm prune --omit=dev


# Final stage for app image
FROM base

# Install packages needed for deployment
RUN apt-get update -qq && \
    apt-get install --no-install-recommends -y \
    chromium \
    chromium-sandbox \
    # Required libraries for Chromium
    libnss3 \
    libnspr4 \
    libatk1.0-0 \
    libatk-bridge2.0-0 \
    libcups2 \
    libdrm2 \
    libdbus-1-3 \
    libxkbcommon0 \
    libxcomposite1 \
    libxdamage1 \
    libxfixes3 \
    libxrandr2 \
    libgbm1 \
    libpango-1.0-0 \
    libcairo2 \
    libasound2 \
    libatspi2.0-0 \
    # Fonts for better rendering
    fonts-liberation \
    fonts-noto-color-emoji \
    fonts-noto-cjk \
    # Clean up
    && rm -rf /var/lib/apt/lists /var/cache/apt/archives

# Copy built application
COPY --from=build /app /app

# Start the server by default, this can be overwritten at runtime
EXPOSE 3000
ENV PUPPETEER_EXECUTABLE_PATH="/usr/bin/chromium"
CMD [ "npm", "run", "start" ]

Sorry this isnt a solution but I also can relate: I have a simple fly app that only scrapes html using puppeeter and the cold start is 30 seconds (machine become reachable in 1s tho). I mitigated my issue but having this separate app with min_machines_running=1

❯ http post https://MYAPP.fly.dev/scrape url\=http://example.com
HTTP/1.1 200 OK
connection: keep-alive
content-encoding: gzip
content-type: application/json; charset=utf-8
date: Thu, 04 Dec 2025 11:17:14 GMT
etag: W/"7a6-lHgbUub0CoMXMkv1sm+pm279eqk"
fly-request-id: 01KBMH8C0RAY1RERJGZ2153KH1-dfw
keep-alive: timeout=5
server: Fly/340afcba (2025-12-03)
transfer-encoding: chunked
via: 1.1 fly.io, 1.1 fly.io
x-powered-by: Express

{
    ...
    "status": 200
}
~ took 30s

❯ http post https://MYAPP.fly.dev/scrape url\=http://example.com
HTTP/1.1 200 OK
connection: keep-alive
content-encoding: gzip
content-type: application/json; charset=utf-8
date: Thu, 04 Dec 2025 11:18:41 GMT
etag: W/"7a6-lHgbUub0CoMXMkv1sm+pm279eqk"
fly-request-id: 01KBMHBVMA52KJ5Q2HP98A1SYN-iad
keep-alive: timeout=5
server: Fly/340afcba (2025-12-03)
transfer-encoding: chunked
via: 1.1 fly.io
x-powered-by: Express

{
    ...
    "status": 200
}
~ took 2s

For the next step, I’d look at the logs from the VM; I should think it executes the npm entrypoint within 3-4 seconds, and Puppeteer is being slow to start. I’d also suggest looking at CPU graphs in Grafana; I wonder if the recently-tweaked CPU throttling is kicking in.

There’s also the little-known I/O throttling, which is separate from CPU throttling:

The performance of these ephemeral disks are heavily limited regardless of the Machine type you choose, with the maximum of 2000 IOPs and 8MiB/s bandwidth.

Even a performance-class Machine is limited to 8 MB per second from the root partition. It would only take 160 MB of reads to cause a 20 second delay, :snowflake:

2 Likes

Thanks, then I think there’s nothing I can do on my side, except having at least 1 machine running, and assume it will take 25 seconds to scale up.

Plaintext crawl

I’d still be minded to do some timing experiments. I use Crawlee, and it is quicker than this. Just a quick look at my logs:

  • Create request goes in at 23:36:04
  • Full machine boot in 7.511s
  • Fly SSH opened at 23:46:09
  • Machine switched to booted status at 23:46:11
  • Crawlee starts at 23:46:12

This is a plaintext crawler in a shared-cpu-1x@256MB instance, pretty much the smallest and least capable machine available. Now this doesn’t use a browser, and thus the image is 354MB, which is pretty small as these things go. Maybe that contributes to the faster boot time?

Browser

I just repeated the experiment with a browser version; this utilises Playwright and Firefox:

  • Create request goes in at 14:14:30
  • Full machine boot in 15.044s
  • Fly SSH opened at 14:14:45
  • Machine switched to booted status at 14:14:48
  • Crawlee starts at 14:14:48

So, 18 seconds, and this one is on a shared-cpu-4x@2048MB. This is slower than I thought. The image in this case is just a shade over 1.0GB.

In both cases, the machine boot is the problem, and not anything at the application level. If I get my project off the ground, a queue of waiting machines will be my solution, but for now, the wait is trivial.

Thanks, that is quicker than my case, 18 seconds is better than 25 seconds…

I use Puppeteer + Chromium. Each machine has 5 browsers with max 10 pages each.

Another thing to think about is whether you could start a small number of machines and then stop them. I assume starting them will be much quicker than booting them, and if memory serves correctly, one only pays for volume and maybe image storage for stopped machines.

(The only caveat here is that the billing picture for machines is slightly in Flux, given that Fly is a relatively new platform, and I should think stopped machines could be charged for in the future, given that they will still consume resources).

Thanks again,

According to Machine Suspend and Resume · Fly Docs

Suspend is not currently recommended for large machine memory sizes (> 2 GB)

So I can’t use that as I’m on 4 Gb machines, but thanks anyway for the tip - I’ll use suspend in other apps with a smaller size.

True, but you could use stop and start though. This will still save the pull time for the Docker image, which I think is where most of that boot delay comes from.

I’m already using stop / start and it still takes 25 seconds to boot up a stopped machine…

Oh right, sorry; I thought these were deployments (my timings above are).

So then, this is very odd; starting a machine should not require an image pull, as I think the machine should be associated with a physical host already. As @lubien says, the machine becomes reachable in a second. If control passes to Puppeteer very soon after, then Puppeteer is taking that time to open a listener. Is it pulling a fresh browser binary every time, or something like that?

Are there any clues in your existing machine logs? Does Puppeteer have a verbose mode so you can get a better view of what’s happening?

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.