After hours of converting my dockerfile to other distros and disabling features of my code one by one I am clueless. Whatever I try do, every
flyctl deploy --remote-only ends up in a segmentation fault.
From a dockerfile as simple as:
RUN echo "why"
the echo causes a segmentation fault.
My start-up is halted because of this and I have no way of getting my servers back up because the code that worked in the past doesn’t work anymore now, and fly.io didn’t revert to a working image from the past.
What is wrong? I’ve been trying to fix this for 6 hours straight and I’m very frustrated.
Can you try deleting your builder app? That will get a new one started up that might help.
I did, still doesn’t work.
Will you post the full logs when you use
--remote-only? This segfault sounds like it might not be related to the Docker build.
@kurt Here you go
RUN echo "Segmentation fault 🤡🤡🤡"
CMD [ "pnpm", "start" ]
flyctl deploy --remote-only --env MY_VARS=$BLABLA
==> Verifying app config
--> Verified app config
==> Building image
Remote builder fly-builder-bitter-morning-0000 ready
==> Creating build context
--> Creating build context done
==> Building image with Docker
--> docker host: 20.10.12 linux x86_64
[+] Building 0.6s (0/1)
[+] Building 1.3s (5/5) FINISHED
=> [internal] load remote build context 0.0s
=> copy /context / 0.1s
=> [internal] load metadata for docker.io/library/archlinux:base-devel 0.9s
=> CACHED [1/2] FROM docker.io/library/archlinux:base-devel@sha256:ff6e6146181dfeb8cc19d5c70337c416e464a210448feefe32f700591f82d016 0.0s
=> ERROR [2/2] RUN echo "Segmentation fault 🤡🤡🤡" 0.3s
> [2/2] RUN echo "Segmentation fault 🤡🤡🤡":
Error failed to fetch an image or build from source: error building: executor failed running [/bin/sh -c echo "Segmentation fault 🤡🤡🤡"]: exit code: 139
Well this is confusing. I can’t get any remote builder to segfault like that. Do you have other files in your working directory? I don’t see them getting added to the image, so i wouldn’t think they’d matter, but that’s the only thing I can think of.
I did at the time, but I tested again in a new directory, only with a dockerfile and a
fly.toml and it still segfaults.
Then I started trying all archlinux images in chronological order until I found one that builds. The most recent I found is
archlinux:base-devel-20220213.0.47747, which passed the build with only a
RUN echo "something"
step, but when I used it with in my actual code it segfaulted while installing packages. This continued to be the case on all the older archlinux images, until they got so old that they couldn’t update packages.
So I tried ubuntu, which builds no problem, but segfaults when the container is run and fails deployment.
Building and running locally works fine. Deploying locally built image segfaults at runtime. I’m losing my mind
Ok that is helpful. I did manage to replicate the segfault, it seems to happen on some of our worker hardware that uses a slightly older kernel. I’m betting your builder and the running container both suffer from the same thing.
Which ubuntu image did you use? Are you installing some some specific package or just using a similar base?
ubuntu:latest, but I tried again and now it works fine. Still, I preferred arch because ubuntu packages are old. Any chance I can use it again in the future?
I just hit my head on this while experimenting with
ubuntu:21.10 and falling back to
ubuntu:latest got it working.
@kurt I have the same issue with
hexpm/elixir:1.13.3-erlang-24.3.1-ubuntu-impish-20211102. I can’t seem to build anything based on impish with fly.
My latest theory is a change in our guest kernels made them incompatible with some of our host kernels. This is a long shot, but I can start testing older kernels.
In the meantime, we will accelerate our plan for rolling out host kernel upgrades.
We are still working on this, we do not have a quick fix. It’s frustrating and a shit user experience, we’ll get it fixed.
By the way, this seems to affect Ubuntu 21.01 images. And presumably something recent in
archlinux:base-devel. If you switch to Ubuntu 20.01 you might get unstuck.
@ryansch I just realized you said that but I didn’t recognize
impish at the time.
I also ran into this issue with impish, but seemingly only when also using a volume.
Anything I can do about it? I really can’t work like this
node:alpine docker image doesn’t work. (Works fine locally)
v265 failed - Failed due to unhealthy allocations - not rolling back to stable job version 265 as current job has same specification and deploying as v266
Upgrading the kernel on all our hosts means a downtime period and careful planning
This is something we’ve been meaning to do, but there’s a lot going on at Fly and we’re not a big team yet.
You should pin your
FROM to a more specific tag. You probably don’t need / want to always be on
node:alpine or else you might deploy one day and be on a completely different node.js and / or alpine version. This could break your app in odd ways.
We do want people to be able to run any image on us, so we’re definitely going to fix it.
Will that hit the blog when you do?
That should hit the changelog when it happens!