Container (non-root) user can't write to /dev/stdout or /dev/stderr

Hi Jerome. Thanks for the quick reply!

The apache user is running as cnb. Even taking Apache out of it and just running the Procfile per the OP shows that the user is running as cnb and can write to implicit stdout but not to /dev/stdout:

> whoami
cnb

> whoami >> /dev/stdout
/dev/stdout: Permission denied

It does seem like this would be pretty straightforward for Fly to fix.

Lots of software packages and pre-built Docker images follow best practices of dropping root privileges. There is little end users can do to work around this without clunky hacks.

root is a terrible for security (see also: SELinux); and hence running as root is now the best practice iff you don’t rely on weaker sandboxes like Containers (Fly doesn’t).

I can reproduce this issue without using the Paketo stack, and instead just having the following Dockerfile:

FROM ubuntu:latest
USER www-data
CMD whoami >> /proc/self/fd/1

This logs:

[info] /bin/sh: 1: cannot create /proc/self/fd/1: Permission denied

Same if I replace /proc/self/fd/1 with /dev/stdout.

However, if I remove the USER www-data line in order to let this run as root, then it works as expected and this gets logged:

[info] root

If I then restore the USER www-data line, but change the CMD line to just CMD whoami , then that also works as expected and this gets logged:

[info] www-data

For my original use case of Apache HTTPD, I can get it to work by changing:

ErrorLog /dev/stderr

to:

ErrorLog "|$/bin/cat >&2"

Which is fine except that it results in two extra processes being spawned: one for “cat” and one for the shell (“|$” tells Apache to open the pipe via a shell, which is needed for the “>&2” to work).

2 Likes

Sorry it’s taking so long, working on many things simultaneously.

I wonder what Docker does here. I doubt it creates a pipe like that.

Does chmod o+w /dev/stdout /dev/stderr, while logged in as the cnb user, in an entrypoint make it work?

Could this be an issue with Fly’s init? To collect and summarise the known information:

  • The code in the public init-snapshot repo clearly tries to set the ownership of the pipes to the app user (credit to @ignoramus for the find).

  • However the pipes are still owned by root (as demonstrated by @Alex21).
    Minimal Dockerfile combining the relevant commands:

    FROM ubuntu:latest
    USER www-data
    CMD whoami; ls -l /dev/stdout /proc/self/fd/1; stat -L /proc/self/fd/1; echo "test" >>/dev/stdout
    

    Logs:

    www-data
    lrwxrwxrwx 1 root     root     15 Jan 27 19:00 /dev/stdout -> /proc/self/fd/1
    l-wx------ 1 www-data www-data 64 Jan 27 19:00 /proc/self/fd/1 -> pipe:[4403]
      File: /proc/self/fd/1
      Size: 0          Blocks: 0          IO Block: 4096   fifo
    Device: ch/12d     Inode: 4403        Links: 1
    Access: (0600/prw-------)  Uid: (    0/    root)   Gid: (    0/    root)
    Access: 2023-01-27 19:00:15.069441815 +0000
    Modify: 2023-01-27 19:00:15.085441814 +0000
    Change: 2023-01-27 19:00:15.085441814 +0000
     Birth: -
    /bin/sh: 1: cannot create /dev/stdout: Permission denied
    

    The problem is with the ownership of pipe:[4403].
    The line “Access: (0600/prw-------) Uid: ( 0/ root) Gid: ( 0/ root)”
    should be “Access: (0600/prw-------) Uid: ( 33/www-data) Gid: ( 33/www-data)”.

    (Aside: stat -L /proc/1/fd/* shows that all the pipes of PID 1 are also owned by root.)


I don’t know why the pipes have the wrong ownership.

However I can explain a small part of the weirdness, specifically why writing to standard out succeeds but redirecting to /dev/stdout fails (as pointed out by @Alex21):

The kernel does not perform permission checks on file descriptors inherited by a process, but it does perform permission checks when a process tries to re-open those file descriptors using a path such as /dev/stdout or /proc/self/fd/1.

Here is a demonstration using regular files instead of pipes. (Edit: explanation below)

# whoami
root
# touch /tmp/out; ls -l /tmp/out
-rw-r--r-- 1 root root 0 Jan 27 19:03 /tmp/out
# cmd='whoami; ls -l /dev/stdout /proc/self/fd/1; echo "test" >>/dev/stdout'
# su -s /bin/sh -c "$cmd" www-data >/tmp/out
sh: 1: cannot create /dev/stdout: Permission denied
# cat /tmp/out
www-data
lrwxrwxrwx 1 root     root     15 Jan 27 19:02 /dev/stdout -> /proc/self/fd/1
l-wx------ 1 www-data www-data 64 Jan 27 19:03 /proc/self/fd/1 -> /tmp/out

Maybe this is why this issue has been so elusive – most apps just log to standard output/error, so they work even though the pipes have the wrong ownership. It’s only when an app tries to use /dev/stdout that the wrong ownership becomes apparent.


Temporary workaround:

Inspired by Alex21’s excellent cat trick, here is a more transparent workaround – set ENTRYPOINT in the Dockerfile (or in fly.toml) as follows:

ENTRYPOINT ["/bin/sh", "-c", "\"$@\" 2>&1 | cat", "/bin/sh"]

And here is a more elaborate version that keeps stdout and stderr separate:

ENTRYPOINT ["/bin/sh", "-c", "mkfifo /tmp/stdout /tmp/stderr; cat /tmp/stdout >&1 & cat /tmp/stderr >&2 & exec \"$@\" >/tmp/stdout 2>/tmp/stderr", "/bin/sh"]

(The named pipes in the second version could be removed/avoided using some file descriptor acrobatics. The second version only spawns two long-lived processes, both cat, which I think is the bare minimum if stdout and stderr are to be separate. The first version only spawns one cat process but its sh process is also long-lived. If the image had an ENTRYPOINT that needs to be preserved, it should be manually inserted before the \"$@\".)

1 Like

Hi @jerome, I’m not the OP but I tried this anyway. It fails with chmod: changing permissions of '/dev/stdout': Operation not permitted (because the pipes are owned by root).

On the other hand, if in the Dockerfile I change USER to root and change CMD so that it runs chmod (or chown) then su’s to the unprivileged user (www-data/cnb), it works – the chmod/chown succeeds and www-data can write to /dev/stdout. (Obviously this is not a viable solution, because USER is not supposed to be root.)

I’d like to add that the old code in init-snapshot (linked in my other comment) looks fine to me, and I hope debugging would be a straightforward matter of tracking down where the pipe ownership gets messed up on the way from init’s fchown to the execution of the Docker entrypoint.

1 Like

Is there a way on fly.io to set the runAs user other than using the USER command in Dockerfile (as in, the equivalent to docker run --user)? In the case where I’m using the paketo buildpack system, there’s no Dockerfile, so I can’t use the USER instruction there. But if there’s a way to launch an app/machine as the root user regardless of the user instruction that’s in the image, then doing that and then letting the entrypoint chmod the pipes and then su to the cnb user before running the main process seems like a reasonable approach/workaround.

I have no idea about overriding the user*, but can’t you just override ENTRYPOINT (or CMD) in fly.toml? The temporary workaround I mentioned should “just work”, though you’ll probably need to also identify the original value of ENTRYPOINT/CMD and chain it in (you should be able to identify it by pulling the image and inspecting it).

*The Machines API appears to support UserOverride, but I can’t spot any way to set it from fly.toml.

Will you pls explain what these two lines test? I can see that the string test wouldn’t append to /tmp/out (which is a regular file?), but have trouble understanding how /tmp/out became a stdout sink and why /bin/sh -c ... >/tmp/out didn’t (or isn’t expected to) truncate /tmp/out…?


Yes, via the experimental section of an app’s fly.toml (docs, ref), or the processes section (where app is the default name for an entrypoint / main process on Fly):

(unsure if flyctl commands for Machines / Apps v2 support experimental section, but it does support processes ref).

1 Like

Oh yeah, that wasn’t very self-explanatory. Here is a simplified version:

su -s /bin/sh -c 'echo "test 1"; echo "test 2" >>/dev/stdout' www-data >/tmp/out

Here /tmp/out is a regular file which is only writable by root. The aim is to get the “inner” command (two echo’s) to run as user www-data with fd 1 open to /tmp/out. The outer command achieves this by calling su and redirecting output to /tmp/out.

The first echo succeeds (and the output is saved in /tmp/out) even though www-data doesn’t have write access to /tmp/out. The second echo seems to be equivalent (because /dev/stdout is normally equivalent to fd 1), but it fails because /dev/stdout is a symlink to /proc/self/fd/1 which is a symlink to /tmp/out and opening it triggers a permission check which fails.

In the su invocation, the option -s /bin/sh is required because the default shell of www-data is /usr/sbin/nologin.
Changing >>/dev/stdout to >/dev/stdout doesn’t make a difference.


The experimental section only has cmd and entrypoint (and exec), it doesn’t have user.

1 Like

Thank you, @tom93, for your example and for explaining the difference in how the kernel processes permissions for inherited file descriptors vs. opening new ones!

For images created with the Paketo PHP buildpack, the default entrypoint is:

procmgr-binary /layers/paketo-buildpacks_php-start/php-start/procs.yml

That can be overridden by a custom Procfile, which I can set to:

web: mkfifo /tmp/stdout /tmp/stderr; cat /tmp/stdout >&1 & cat /tmp/stderr >&2 & procmgr-binary /layers/paketo-buildpacks_php-start/php-start/procs.yml

That then lets my Apache configuration be:

ErrorLog /tmp/stderr
CustomLog /tmp/stdout common

I think that’s now working, but I’m not totally sure yet because I’m getting errors elsewhere unrelated to this (and I’m pretty sure unrelated to Fly) that I’ll try to work through next week, but conceptually at least I think the above all makes sense to me.

I’m not clear on what this is for. Why do we need to redirect the original process’s stdout and stderr to /tmp/*? If the original process writes to inherited stdout and stderr, don’t those already go to the correct place? Isn’t the problem only with processes that can only log to a pseudo-file-name, like Apache, for which syntax like ErrorLog &2 doesn’t work?

1 Like

Oh, wait, I think I understand now: that redirection is so that the original process ends up having permission to /dev/stdout and /dev/stderr, right? In which case, that’s cool! Sorry it took me a while to get it.

Actually, those unrelated errors were entirely due to a separate process, PHP-FPM, also writing to /proc/self/fd/2 and being denied permission.

Therefore, @tom93’s original solution worked perfectly.

In other words, the following fly.toml:

[build]
  builder = "paketobuildpacks/builder-jammy-full"

[build.args]
  BP_PHP_SERVER = "httpd"
  BP_PHP_WEB_DIR = "web"
  BPE_DEFAULT_PORT = 8080

Combined with the following Procfile:

web: mkfifo /tmp/stdout /tmp/stderr; cat /tmp/stdout >&1 & cat /tmp/stderr >&2 & procmgr-binary /layers/paketo-buildpacks_php-start/php-start/procs.yml >/tmp/stdout 2>/tmp/stderr

is all that’s needed for a PHP app in the “web” directory to work! No customization of any Apache or PHP-FPM configuration files needed. At least not for things to initially work: not sure if I’ll need to customize any of that config to optimize a real app.

Thanks again, @tom93, for the clever workaround, and @jerome and others for looking into this and if you end up fixing the original pipe ownership issue upstream in Fly!

Just to clear things up for posterity, there are a couple of distinct “startup” commands and they get executed in the following order (each one executes the next):

  1. Dockerfile ENTRYPOINT (can be overridden using fly.toml’s “experimental.entrypoint”)
  2. Dockerfile CMD (can be overridden using fly.toml’s “experimental.cmd”)
  3. Procfile commands (if the app uses a Procfile)

[Edit: Machines currently ignore fly.toml’s “experimental.entrypoint” and “experimental.cmd”, see below for alternative.]

You can override any one of them (obviously you have to do it carefully so as to preserve the original behaviour). My preference would be to override ENTRYPOINT (either in the Dockerfile or in fly.toml); one advantage is that this will apply to all the process types in the Procfile.

The trickiest part is working out the image’s default ENTRYPOINT in order to chain it in. A simple way is to obtain the image (either build it locally using fly deploy --local-only or pull it from Fly’s registry), then run docker inspect <image> -f '{{json .Config.Entrypoint}}'.

In the case of the Paketo PHP buildpack, ENTRYPOINT is ["/cnb/process/web"], so one would use the following in fly.toml:

[experimental]
  entrypoint = ["/bin/sh", "-c", "/cnb/process/web \"$@\" 2>&1 | cat", "/bin/sh"]

(This is the simplified version that merges stdout and stderr into a single stream; I edited my original post to include it.)


Yes, that’s right :‍)

1 Like

If there’s a need to keep stdout and stderr separate, I think this can do it by using bash’s process substitution:

[experimental]
  entrypoint = ["/bin/bash", "-c", "/cnb/process/web \"$@\" > >(cat) 2> >(cat >&2)", "/bin/bash"]

Is there any value in keeping stdout and stderr separate on Fly? I haven’t looked into how Fly manages logs yet.

Note that with Machines, I don’t think setting entrypoint in fly.toml works, but you can set it via:

fly machine update --entrypoint ...
1 Like

At the moment no (Fly sets the log level of both to “info” so they are indistinguishable). But in principle it’s nice.

Bash’s process substitution does make it much easier, but I didn’t want to rely on bash. Just for fun, here is a sh version using the “file descriptor acrobatics” I hinted at earlier:

  entrypoint = ["/bin/sh", "-c", "{ { original_entrypoint \"$@\" 3>&- | cat >&3; } 2>&1 | cat >&2; } 3>&1", "/bin/sh"]

(The first cat is for stdout, the second is for stderr, and fd 3 is a dup of the original stdout. 3>&- closes fd 3 to avoid leaking it to the original entrypoint.)

Indeed, good point. The quoting rules are undocumented, I had to use

$ fly machine update <machine-id> --entrypoint '/bin/sh -c "original_entrypoint \"$@\" 2>&1 | cat" /bin/sh'

(Replace original_entrypoint with the image’s default ENTRYPOINT, e.g. /cnb/process/web; or delete it if the image doesn’t have a special ENTRYPOINT.)

1 Like

@jerome (given this discussion above) is it expected that the pipes setup by init should to be owned by root? If that’s a mistake, can we expect a change for those to be owned by USER instead?

I’ve been wondering how it’s supposed to behave. We can likely make it work any way we want it to!

What’s the correct way this should work for our users you think?

1 Like

I’d like to find a way to get nginx to run as a non-root user and have its logs be visible via fly logs or the monitoring tab on the dashboard. Here’s a Dockerfile that doesn’t quite work today:

# syntax = docker/dockerfile:1

FROM debian:bullseye-slim

# Install nginx
RUN apt-get update -qq && \
    apt-get install --no-install-recommends -y nginx && \
    rm -rf /var/lib/apt/lists /var/cache/apt/archives

# configure nginx
RUN chown www-data:www-data /var/lib/nginx && \
    sed -i 's/80/8080/' /etc/nginx/sites-available/default && \
    sed -i 's/^user/#user/' /etc/nginx/nginx.conf && \
    sed -i 's/access_log\s.*;/access_log \/dev\/stdout;/' /etc/nginx/nginx.conf && \
    sed -i 's/error_log\s.*;/error_log \/dev\/stderr info;/' /etc/nginx/nginx.conf

# run without root privs
USER www-data:www-data

# Start nginx
EXPOSE 8080
CMD ["nginx", "-g", "daemon off;"]

The error produced is:

[info]nginx: [emerg] open() "/dev/stdout" failed (13: Permission denied)
1 Like