After hours of converting my dockerfile to other distros and disabling features of my code one by one I am clueless. Whatever I try do, every flyctl deploy --remote-only ends up in a segmentation fault.
From a dockerfile as simple as:
FROM archlinux:base-devel
RUN echo "why"
the echo causes a segmentation fault.
My start-up is halted because of this and I have no way of getting my servers back up because the code that worked in the past doesn’t work anymore now, and fly.io didn’t revert to a working image from the past.
What is wrong? I’ve been trying to fix this for 6 hours straight and I’m very frustrated.
Well this is confusing. I can’t get any remote builder to segfault like that. Do you have other files in your working directory? I don’t see them getting added to the image, so i wouldn’t think they’d matter, but that’s the only thing I can think of.
I did at the time, but I tested again in a new directory, only with a dockerfile and a fly.toml and it still segfaults.
Then I started trying all archlinux images in chronological order until I found one that builds. The most recent I found is archlinux:base-devel-20220213.0.47747, which passed the build with only a
RUN echo "something"
step, but when I used it with in my actual code it segfaulted while installing packages. This continued to be the case on all the older archlinux images, until they got so old that they couldn’t update packages.
So I tried ubuntu, which builds no problem, but segfaults when the container is run and fails deployment.
Building and running locally works fine. Deploying locally built image segfaults at runtime. I’m losing my mind
Ok that is helpful. I did manage to replicate the segfault, it seems to happen on some of our worker hardware that uses a slightly older kernel. I’m betting your builder and the running container both suffer from the same thing.
Which ubuntu image did you use? Are you installing some some specific package or just using a similar base?
I used ubuntu:latest, but I tried again and now it works fine. Still, I preferred arch because ubuntu packages are old. Any chance I can use it again in the future?
My latest theory is a change in our guest kernels made them incompatible with some of our host kernels. This is a long shot, but I can start testing older kernels.
In the meantime, we will accelerate our plan for rolling out host kernel upgrades.
By the way, this seems to affect Ubuntu 21.01 images. And presumably something recent in archlinux:base-devel. If you switch to Ubuntu 20.01 you might get unstuck.
@ryansch I just realized you said that but I didn’t recognize impish at the time.
The latest node:alpine docker image doesn’t work. (Works fine locally)
v265 failed - Failed due to unhealthy allocations - not rolling back to stable job version 265 as current job has same specification and deploying as v266
Upgrading the kernel on all our hosts means a downtime period and careful planning
This is something we’ve been meaning to do, but there’s a lot going on at Fly and we’re not a big team yet.
You should pin your FROM to a more specific tag. You probably don’t need / want to always be on node:alpine or else you might deploy one day and be on a completely different node.js and / or alpine version. This could break your app in odd ways.
We do want people to be able to run any image on us, so we’re definitely going to fix it.