Segmentation fault no matter what i have in my Dockerfile

After hours of converting my dockerfile to other distros and disabling features of my code one by one I am clueless. Whatever I try do, every flyctl deploy --remote-only ends up in a segmentation fault.

From a dockerfile as simple as:

FROM archlinux:base-devel
RUN echo "why"

the echo causes a segmentation fault.

My start-up is halted because of this and I have no way of getting my servers back up because the code that worked in the past doesn’t work anymore now, and fly.io didn’t revert to a working image from the past.

What is wrong? I’ve been trying to fix this for 6 hours straight and I’m very frustrated.

1 Like

Can you try deleting your builder app? That will get a new one started up that might help.

I did, still doesn’t work.

Will you post the full logs when you use --remote-only? This segfault sounds like it might not be related to the Docker build.

@kurt Here you go

Dockerfile

FROM archlinux:base-devel
RUN echo "Segmentation fault 🤡🤡🤡"
CMD [ "pnpm", "start" ]

Console output

flyctl deploy --remote-only --env MY_VARS=$BLABLA 
==> Verifying app config
--> Verified app config
==> Building image
Remote builder fly-builder-bitter-morning-0000 ready
==> Creating build context
--> Creating build context done
==> Building image with Docker
--> docker host: 20.10.12 linux x86_64
[+] Building 0.6s (0/1)                                                                                                                                                                                                                     
[+] Building 1.3s (5/5) FINISHED                                                                                                                                                                                                            
 => [internal] load remote build context                                                                                                                                                                                               0.0s
 => copy /context /                                                                                                                                                                                                                    0.1s
 => [internal] load metadata for docker.io/library/archlinux:base-devel                                                                                                                                                                0.9s
 => CACHED [1/2] FROM docker.io/library/archlinux:base-devel@sha256:ff6e6146181dfeb8cc19d5c70337c416e464a210448feefe32f700591f82d016                                                                                                   0.0s
 => ERROR [2/2] RUN echo "Segmentation fault 🤡🤡🤡"                                                                                                                                                                                          0.3s
------
 > [2/2] RUN echo "Segmentation fault 🤡🤡🤡":
------
Error failed to fetch an image or build from source: error building: executor failed running [/bin/sh -c echo "Segmentation fault 🤡🤡🤡"]: exit code: 139
1 Like

Well this is confusing. I can’t get any remote builder to segfault like that. Do you have other files in your working directory? I don’t see them getting added to the image, so i wouldn’t think they’d matter, but that’s the only thing I can think of.

I did at the time, but I tested again in a new directory, only with a dockerfile and a fly.toml and it still segfaults.

Then I started trying all archlinux images in chronological order until I found one that builds. The most recent I found is archlinux:base-devel-20220213.0.47747, which passed the build with only a

RUN echo "something"

step, but when I used it with in my actual code it segfaulted while installing packages. This continued to be the case on all the older archlinux images, until they got so old that they couldn’t update packages.

So I tried ubuntu, which builds no problem, but segfaults when the container is run and fails deployment.

Building and running locally works fine. Deploying locally built image segfaults at runtime. I’m losing my mind

1 Like

Ok that is helpful. I did manage to replicate the segfault, it seems to happen on some of our worker hardware that uses a slightly older kernel. I’m betting your builder and the running container both suffer from the same thing.

Which ubuntu image did you use? Are you installing some some specific package or just using a similar base?

I used ubuntu:latest, but I tried again and now it works fine. Still, I preferred arch because ubuntu packages are old. Any chance I can use it again in the future?

I just hit my head on this while experimenting with ubuntu:21.10 and falling back to ubuntu:latest got it working.

1 Like

@kurt I have the same issue with hexpm/elixir:1.13.3-erlang-24.3.1-ubuntu-impish-20211102. I can’t seem to build anything based on impish with fly.

My latest theory is a change in our guest kernels made them incompatible with some of our host kernels. This is a long shot, but I can start testing older kernels.

In the meantime, we will accelerate our plan for rolling out host kernel upgrades.

2 Likes

We are still working on this, we do not have a quick fix. It’s frustrating and a shit user experience, we’ll get it fixed.

3 Likes

By the way, this seems to affect Ubuntu 21.01 images. And presumably something recent in archlinux:base-devel. If you switch to Ubuntu 20.01 you might get unstuck.

@ryansch I just realized you said that but I didn’t recognize impish at the time. :slight_smile:

1 Like

I also ran into this issue with impish, but seemingly only when also using a volume.

Anything I can do about it? I really can’t work like this

1 Like

The latest node:alpine docker image doesn’t work. (Works fine locally)

v265 failed - Failed due to unhealthy allocations - not rolling back to stable job version 265 as current job has same specification and deploying as v266

Upgrading the kernel on all our hosts means a downtime period and careful planning :slight_smile:

This is something we’ve been meaning to do, but there’s a lot going on at Fly and we’re not a big team yet.

You should pin your FROM to a more specific tag. You probably don’t need / want to always be on node:alpine or else you might deploy one day and be on a completely different node.js and / or alpine version. This could break your app in odd ways.

We do want people to be able to run any image on us, so we’re definitely going to fix it.

3 Likes

Will that hit the blog when you do?

That should hit the changelog when it happens!