docker exec doesn't work for most images within a sprite

Within sprites, running docker exec will fail for any container whose PID 1 is non-root. This breaks exec for many production images (ones I tried included Postgres, Redis, node, and elasticsearch), since they drop privileges on their entrypoint.

We’re trying to support a wider variety of development environments, and being able to spin up some of these containers per existing docker compose configuration is often inconvenient. The exec failure is the main blocker, and there are some other inconveniences because many of the microVM’s capabilities are disabled (set is restricted to 16 of 41 Linux capabilities).

Trying to understand what is a fundamental limitation of firecracker / microVMs, vs something that is currently a limitation of the sprites platform but could be configurable in the future, vs what is actually just user error from my side!

The below debugging details were gathered with coding agents, but this text (and the text + questions at the end) were written by me, Cliff, a human!

I’ve already followed guidance from the /.sprite/docs/docker.md file, and from this thread: Sprite too slow for docker?


Environment:

Kernel:      6.12.87-fly
Docker:      29.1.3
runc:        1.3.4
OS:          Ubuntu 25.10 (inside Sprite inner container)
Cgroup:      v2 unified (cgroup_no_v1=all)
daemon.json: {"storage-driver":"overlay2", "default-cgroupns-mode":"host"}

Repro script:

#!/usr/bin/env bash
set -euo pipefail

echo "=== Container PID 1 as root → exec works ==="
sudo docker run -d --name test-root alpine sleep 60
sudo docker exec test-root echo "OK"
sudo docker rm -f test-root

echo ""
echo "=== Redis (drops to UID 999 via gosu) → exec fails ==="
sudo docker run -d --name test-redis redis:7-alpine
sleep 2
sudo docker exec test-redis redis-cli ping || true
sudo docker rm -f test-redis

echo ""
echo "=== Postgres (drops to UID 999) → exec fails ==="
sudo docker run -d --name test-pg -e POSTGRES_PASSWORD=test pgvector/pgvector:pg16
sleep 5
sudo docker exec test-pg psql -U postgres -c 'SELECT 1' || true
sudo docker rm -f test-pg

echo ""
echo "=== Plain Alpine with su to non-root → exec fails ==="
sudo docker run -d --name test-su alpine \
  sh -c 'adduser -D -u 1000 app && exec su -s /bin/sh app -c "sleep 60"'
sleep 2
sudo docker exec test-su echo "ok" || true
sudo docker rm -f test-su

echo ""
echo "=== --privileged does not help ==="
sudo docker run -d --name test-priv --privileged redis:7-alpine
sleep 2
sudo docker exec test-priv redis-cli ping || true
sudo docker rm -f test-priv

echo ""
echo "=== seccomp=unconfined does not help ==="
sudo docker run -d --name test-nosec --security-opt seccomp=unconfined redis:7-alpine
sleep 2
sudo docker exec test-nosec redis-cli ping || true
sudo docker rm -f test-nosec

Output:

=== Container PID 1 as root → exec works ===
OK

=== Redis (drops to UID 999 via gosu) → exec fails ===
OCI runtime exec failed: exec failed: unable to start container process:
error executing setns process: exit status 1

=== Postgres (drops to UID 999) → exec fails ===
OCI runtime exec failed: exec failed: unable to start container process:
error executing setns process: exit status 1

=== Plain Alpine with su to non-root → exec fails ===
OCI runtime exec failed: exec failed: unable to start container process:
error executing setns process: exit status 1

=== --privileged does not help ===
OCI runtime exec failed: exec failed: unable to start container process:
error executing setns process: exit status 1

=== seccomp=unconfined does not help ===
OCI runtime exec failed: exec failed: unable to start container process:
error executing setns process: exit status 1

Root cause — restricted capability bounding set:

$ cat /proc/1/status | grep CapBnd
CapBnd:    00000000a82435fb

$ capsh --decode=00000000a82435fb
0x00000000a82435fb=cap_chown,cap_dac_override,cap_fowner,cap_fsetid,
cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,
cap_net_raw,cap_net_admin,cap_sys_chroot,cap_sys_admin,cap_mknod,
cap_audit_write,cap_setfcap

# 16 of 41 capabilities. Full set: 0x1ffffffffff
# --privileged container gets the same restricted set:
$ sudo docker run --rm --privileged alpine sh -c 'cat /proc/1/status | grep CapBnd'
CapBnd:    00000000a82435fb

Pattern: docker exec works when the container’s PID 1 is root (has effective capabilities). It fails when PID 1 is non-root (effective capabilities are all zero after privilege drop). runc’s nsexec returns exit status 1, which maps to a write_file() failure in nsexec.c — likely a cgroup control file write that fails when the target container’s process has no effective capabilities, and the host can’t compensate because the needed capability isn’t in the bounding set.

Practical impact: Most production Docker images drop to non-root users in their entrypoints (Postgres, Redis, MySQL, Node, Nginx, Elasticsearch, MongoDB). This means docker exec — the standard way to inspect, debug, and manage running containers — is broken for essentially all real-world Docker workflows on Sprites.

Other effects: Beyond the docker exec failure, the restricted bounding set also blocks --cap-add for capabilities that some development workflows depend on. The most impactful is SYS_PTRACE: running strace inside any container returns ptrace(PTRACE_TRACEME, ...): Operation not permitted, and gdb -p <pid> returns ptrace: Operation not permitted. These are standard debugging tools for diagnosing slow queries, connection hangs, and crash analysis in database containers — and there is no workaround because --cap-add SYS_PTRACE fails against the bounding set. IPC_LOCK matters for Elasticsearch when bootstrap.memory_lock=true is set (recommended for production-like testing): the bootstrap check logs Unable to lock JVM Memory: error=12, reason=Cannot allocate memory and the node refuses to start. SYS_RESOURCE causes Redis to auto-reduce maxclients with the warning Redis can't set maximum open files to 10032 because of OS error: Operation not permitted — workable via Docker’s --ulimitflag, but --ulimit itself cannot grant capabilities that --cap-add can’t. For Postgres and Redis, IPC_LOCK and SYS_NICE have no observable effect with default configurations — Postgres silently falls back from huge pages, and neither service calls setpriority().


(back to human-authored text!)

Questions:

  1. Is the restricted capability set intentional? And/or required? What of this is inherent to firecracker, vs a hard requirement for sprites, vs a suggestion from sprites that could be configurable in the future?
  2. Which specific capability restriction causes the setns failure? My agent suspects SYS_PTRACE (namespace entry for non-root processes), but can’t confirm without strace.
  3. Can the bounding set be configurable per-sprite or per-org? At minimum, whatever capability unblocks docker exec would be key to make Docker workflows functional.
  4. How are you handling this? Do you have docker workloads, docker compose files, etc, and has your team run into any of these issues before?

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.