release_command works with two-stage build, but in single-image build private networking fails

Tv1 · April 19, 2022, 7:16pm

I’m deploying a KeystoneJS app. It has successfully deployed before, using a multi-stage build I copied from somewhere, simplified:

FROM node:16-alpine3.14 AS build
WORKDIR /app
COPY . .
RUN npm run build

FROM node:16-alpine3.14
WORKDIR /app
COPY --from=build /app /app
EXPOSE 3000
CMD ["npm", "run", "start"]

And this is fine, with fly.toml having release_command = "npx keystone prisma migrate deploy", it runs migrations as part of deploy just great.

However, that two-stage build is wasteful, it flattens everything into a single layer which prevents layer reuse and forces a 1 GB network transfer on the smallest change.

Switching the container to either

FROM node:16-alpine3.14
WORKDIR /app
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["npm", "run", "start"]

or (trying to stay closer to the original)

FROM node:16-alpine3.14 AS build
WORKDIR /app
COPY . .
RUN npm run build

FROM build
WORKDIR /app
EXPOSE 3000
CMD ["npm", "run", "start"]

both make private networking at deploy time fail:

	 Configuring firecracker
	 Starting virtual machine
	 Starting init (commit: 252b7bd)...
	 Preparing to run: `docker-entrypoint.sh npx keystone prisma migrate deploy` as node
	 2022/04/19 18:49:44 listening on [fdaa:0:57f1:a7b:8aeb:c46d:2b74:2]:22 (DNS: [fdaa::3]:53)
	 Prisma schema loaded from schema.prisma
	 Datasource "postgresql": PostgreSQL database "postgres", schema "cms" at "foo-postgres.internal:5432"
	 Error: P1001: Can't reach database server at `foo-postgres.internal`:`5432`
	 Please make sure your database server is running at `foo-postgres.internal`:`5432`.
	 Startihild exited normally with code: 1
	 Starting clean up.
Error release command failed, deployment aborted

I don’t understand how my changes to the container could break private networking like that. Going back to the two-stage build with COPY --from=build /app /app makes the deploy work, without fail so far. What on earth is going on here?

Tv1 · April 19, 2022, 8:24pm

I think something is wrong with the internal DNS.

I added a ping of the foo-postgres.internal hostname to release_command.

First run: container is built as

FROM node:16-alpine3.14 AS build
...
FROM build
...

result

	 Preparing to run: `docker-entrypoint.sh sh -c /ping.sh && npx keystone prisma migrate deploy` as node
	 2022/04/19 20:09:37 listening on [fdaa:0:57f1:a7b:8aeb:6f22:7ac4:2]:22 (DNS: [fdaa::3]:53)
	 ping: bad address 'foo-postgres.internal'

Second run:

FROM node:16-alpine3.14 AS build
...
FROM node:16-alpine3.14
COPY --from=build /app /app
...

result

	 Preparing to run: `docker-entrypoint.sh sh -c /ping.sh && npx keystone prisma migrate deploy` as node
	 2022/04/19 20:19:49 listening on [fdaa:0:57f1:a7b:8aeb:b14:58d9:2]:22 (DNS: [fdaa::3]:53)
	 PING foo-postgres.internal (fdaa:0:57f1:a7b:21e0:0:bbb4:2): 56 data bytes
	 ping: permission denied (are you root?)
	 Stan child exited normally with code: 1
	 Starting clean up.
	 PING foo-postgres.internal (fdaa:0:57f1:a7b:21e0:0:bbb4:2): 56 data bytes
	 ping: permission denied (are you root?)
	 Stan child exited normally with code: 1
	 Starting clean up.

So yeah, the ping failed because it’s the old school pre-IPPROTO_ICMP kind but the DNS worked. And that’s the only change made to the Containerfile in between the two runs!

jsierles · April 20, 2022, 6:47am

Private networking should always be available, so it’s likely something else is going on here with DNS. Alpine has been known to have problems with DNS queries, so it might be useful to try another distro.

That said, to debug this further, you can remove the release command and run fly ssh console to login to the running VM. There you might want to try some command using dig (after apk add bind-tools) like: dig foo-postgres.internal and see what you get.

Topic		Replies	Views
I also cannot deploy Build debugging	5	561	January 15, 2022
Release command failed; timed out trying to acquire postgres advisory lock Questions / Help	2	1103	September 5, 2022
Deployment release command fails silently Questions / Help	0	267	October 25, 2022
Exit code 1 after deployed successfully Questions / Help	1	593	October 20, 2022
How do I debug release command failures?	6	491	March 30, 2023

release_command works with two-stage build, but in single-image build private networking fails

Related Topics