Kernel panic starting init in a nix-built container with tini as init

I am building a container with Nix to deploy on Fly. I ripped out all its actual contents for a simpler repro that still causes the problem.

You can download the repro container like so: docker pull ghcr.io/jades-projects/actual-server:latest, or build it as below:

Put the code below into repro.nix:

let
  pkgs = import (builtins.fetchTarball "https://github.com/nixos/nixpkgs/archive/d2fc6856824cb87742177eefc8dd534bdb6c3439.tar.gz") {};
  startScript = pkgs.writeShellScript "start.sh" ''
    echo HELLO

    ${pkgs.coreutils}/bin/sleep 1000000
  '';
in
pkgs.dockerTools.buildLayeredImage {
  name = "actual-server";
  tag = "latest";
  config = {
    Entrypoint = [ "${pkgs.tini}/bin/tini" "-g" "--" startScript ];
    ExposedPorts = {
      "5006/tcp" = {};
    };
    Env = [
      "NODE_ENV=production"
    ];
    WorkingDir = "/data";
    Volumes = {
      "/data" = {};
    };
  };
}

Then you can run it locally like so, with the expected output:

co/actual-server - [flake●] » nix-build oopsie.nix
these 7 derivations will be built:
<SNIP>
Adding manifests...
Done.
/nix/store/fxcd8b1k9m2dhrzdgdqp9l9lwcn12zhy-actual-server.tar.gz
co/actual-server - [flake●] » docker load <result
00d32356092b: Loading layer [==================================================>]  10.24kB/10.24kB
5476a9219431: Loading layer [==================================================>]  40.96kB/40.96kB
f003e077fcff: Loading layer [==================================================>]  10.24kB/10.24kB
The image actual-server:latest already exists, renaming the old one with ID sha256:3b41602531f032899672d2c4a9e6a3b56904371d5a9ffa336c2e989dcb9c3a41 to empty string
Loaded image: actual-server:latest
co/actual-server - [flake●] » docker run actual-server:latest
HELLO
^C%                                                                                                                                                                                     

If I try to run it on Fly, however, it will blow up trying to start the container’s init process, somehow:

 2022-08-10T03:36:22.638 app[c6485bc3] yyz [info] Starting init (commit: c86b3dc)...

2022-08-10T03:36:22.670 app[c6485bc3] yyz [info] Preparing to run: `/nix/store/cvnnqiawfsgachx243zdiq4vzwxrh4g8-tini-0.19.0/bin/tini -g -- /nix/store/6dgb4pzgml0gzc2ynswjyj0w0cy77fyr-start.sh` as root

2022-08-10T03:36:22.673 app[c6485bc3] yyz [info] Error: UnhandledIoError(Os { code: 2, kind: NotFound, message: "No such file or directory" })

2022-08-10T03:36:22.675 app[c6485bc3] yyz [info] [ 0.154163] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100

2022-08-10T03:36:22.676 app[c6485bc3] yyz [info] [ 0.155454] CPU: 0 PID: 1 Comm: init Not tainted 5.12.2 #1

2022-08-10T03:36:22.676 app[c6485bc3] yyz [info] [ 0.156396] Call Trace:

2022-08-10T03:36:22.677 app[c6485bc3] yyz [info] [ 0.156841] show_stack+0x52/0x58

2022-08-10T03:36:22.677 app[c6485bc3] yyz [info] [ 0.157417] dump_stack+0x6b/0x86

2022-08-10T03:36:22.678 app[c6485bc3] yyz [info] [ 0.158031] panic+0xfb/0x2bc

2022-08-10T03:36:22.678 app[c6485bc3] yyz [info] [ 0.158558] do_exit.cold+0x60/0xb0

2022-08-10T03:36:22.679 app[c6485bc3] yyz [info] [ 0.159158] do_group_exit+0x3b/0xb0

2022-08-10T03:36:22.680 app[c6485bc3] yyz [info] [ 0.159773] __x64_sys_exit_group+0x18/0x20

2022-08-10T03:36:22.680 app[c6485bc3] yyz [info] [ 0.160475] do_syscall_64+0x38/0x50

2022-08-10T03:36:22.681 app[c6485bc3] yyz [info] [ 0.161068] entry_SYSCALL_64_after_hwframe+0x44/0xae

2022-08-10T03:36:22.682 app[c6485bc3] yyz [info] [ 0.161888] RIP: 0033:0x6ff9c5

2022-08-10T03:36:22.685 app[c6485bc3] yyz [info] [ 0.162392] Code: eb ef 48 8b 76 28 e9 76 05 00 00 64 48 8b 04 25 00 00 00 00 48 8b b0 b0 00 00 00 e9 af ff ff ff 48 63 ff b8 e7 00 00 00 0f 05 <ba> 3c 00 00 00 48 89 d0 0f 05 eb f9 66 2e 0f 1f 84 00 00 00 00 00

2022-08-10T03:36:22.686 app[c6485bc3] yyz [info] [ 0.165390] RSP: 002b:00007fff7deae748 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7

2022-08-10T03:36:22.687 app[c6485bc3] yyz [info] [ 0.166621] RAX: ffffffffffffffda RBX: 00000000004f0ed0 RCX: 00000000006ff9c5

2022-08-10T03:36:22.688 app[c6485bc3] yyz [info] [ 0.167846] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001

2022-08-10T03:36:22.689 app[c6485bc3] yyz [info] [ 0.169037] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000

2022-08-10T03:36:22.691 app[c6485bc3] yyz [info] [ 0.170215] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff7deae7a8

2022-08-10T03:36:22.692 app[c6485bc3] yyz [info] [ 0.171444] R13: 00007fff7deae7b8 R14: 0000000000000000 R15: 0000000000000000

2022-08-10T03:36:22.693 app[c6485bc3] yyz [info] [ 0.172704] Kernel Offset: disabled

2022-08-10T03:36:22.693 app[c6485bc3] yyz [info] [ 0.173323] Rebooting in 1 seconds.. 

This smells like a possible Fly bug, but I’m not 100% certain.

Just eliminated a hypothesis: this is not a layering issue: buildImage yielding one layer has the same behaviour.

1 Like

I’m not sure if it is because you’re running another init as pid 1 or if there’s an actual problem with the binary linking with the msg that it is “no such file or directory”. Or possibly both.

let
  pkgs = import (builtins.fetchTarball "https://github.com/nixos/nixpkgs/archive/d2fc6856824cb87742177eefc8dd534bdb6c3439.tar.gz") {};
in
pkgs.dockerTools.buildImage {
  name = "actual-server";
  tag = "latest";
  config = {
    Entrypoint = [ "${pkgs.coreutils}/bin/sleep" "100000" ];
    ExposedPorts = {
      "5006/tcp" = {};
    };
    Env = [
      "NODE_ENV=production"
    ];
    WorkingDir = "/data";
    Volumes = {
      "/data" = {};
    };
  };
}

This one fails the same, and does not try to run multiple processes. sleep should not be terribly bothered about whether it is pid1 :slight_smile:

A bit of a long shot, but I wonder if the issue could be tini needing PID 1?
This is just an initial guess based off the error message in your call stack, a cross-reference with our init-snapshot repo, from reading about tini behavior on the actual-server project , and on our platform more generally.

I hope this isn’t a complete misapprehension of what’s going on-- thank you so much for your awesome report :slightly_smiling_face:

1 Like

Hi! I found the bug!

It’s that WorkingDir doesn’t exist (which Fly could possibly do better error reporting of). Whoops!

3 Likes

Ah, that makes sense :sweat_smile: Thank you so much! I’ll make sure to let everyone know so that we can get this fixed asap.

Wow, that is a completely unexpected error from the given stacktrace. Nice find.

1 Like