Proposal: Declare Docker images dependencies via the `ruby_slim_docker_packages` key in `gemspec.metadata`

With an official Dockerfile landing in Rails main branch (rails/Dockerfile.tt at main · rubys/rails · GitHub), I’d like to propose a scheme in which Docker apt-get package dependencies are declared per gem in a .gemspec to improve the Docker deployment story now and in the future.

Why

At the time of this writing, packages for Docker are resolved via this code in main (Rails 7.1, which isn’t released yet): rails/app_base.rb at 0e23b0427e86aac5171d02912e3692427c79b800 · rubys/rails · GitHub. This works great for most Rails installations, but when a gem is added that depends on a package being present in a ruby:#{RUBY_VERSION}-slim that’s not present, it must be manually added. Under those circumstances, gem maintainers usually end up documenting packages in a README, like Nokogiri does at Installing Nokogiri - Nokogiri (note that Nokogiri installs in the default Rails Dockerfile).

Providing a list of apt-get packages in gemspec.metadata = {"ruby_slim_docker_packages" => "foo bar"} could eliminate the need for developers to dig through documentation to find the right packages to deploy their Rails application.

The proposal

RubyGems could declared their ruby:#{RUBY_VERSION}-slim package dependencies via the ruby_slim_docker_packages key in the gemspec.metadata.

For example, the active storage gem relies on the libvips package to run in the ruby:3.2.0-slim image. Here’s what the final gemspec would look like (see the :white_check_mark:):

diff --git a/activestorage/activestorage.gemspec b/activestorage/activestorage.gemspec
index 79f6cc50f9..6d8553c750 100644
--- a/activestorage/activestorage.gemspec
+++ b/activestorage/activestorage.gemspec
@@ -27,6 +27,7 @@
     "mailing_list_uri"  => "https://discuss.rubyonrails.org/c/rubyonrails-talk",
     "source_code_uri"   => "https://github.com/rails/rails/tree/v#{version}/activestorage",
     "rubygems_mfa_required" => "true",
+    "ruby_slim_docker_packages" => "libvips"
   }

When a Dockerfile is generated, it could inflect on the Bundle and extract the packages as follows:

def dockerfile_packages
  Bundler.load.specs
    .map{ |gem| gem.metadata["ruby_slim_docker_packages"] }
    .compact
    .map{ |packages| packages.split(" ") }
    .flatten
    .unique
end

Additional, a rails dockerfile:packages command could be implemented that lists the output of the function above.

The following gems would have the following packages declared in the ruby_slim_docker_packages metadata:

  • rails - build-essential git
  • mysql2 - default-libmysqlclient-dev
  • pg - libpq-dev
  • mysql2 - default-libmysqlclient-dev

Arguments against

Package management has a lot of inconsistencies between platforms, versions, and operating systems. The dream would be to support more platforms outside of ruby:slim, like brew packages; however that could get really complicated and require a more sophisticated resolution mechanism than what metadata could provide.

Here’s what the current situation looks like for installing node dependencies in the ruby:#{RUBY_VERSION}-slim packages:

def dockerfile_packages
  # start with the essentials
  packages = %w(build-essential git)

  # add databases: sqlite3, postgres, mysql
  packages += %w(pkg-config libpq-dev default-libmysqlclient-dev)

  # add redis in case Action Cable, caching, or sidekiq are added later
  packages << "redis"

  # ActiveStorage preview support
  packages << "libvips" unless skip_active_storage?

  # node support, including support for building native modules
  if using_node?
    packages += %w(curl node-gyp) # pkg-config already listed above

    # module build process depends on Python, and debian changed
    # how python is installed with the bullseye release.  Below
    # is based on debian release included with the Ruby images on
    # Dockerhub.
    case Gem.ruby_version
    when /^2.7/
      bullseye = ruby_version >= "2.7.4"
    when /^3.0/
      bullseye = ruby_version >= "3.0.2"
    else
      bullseye = true
    end

    if bullseye
      packages << "python-is-python3"
    else
      packages << "python"
    end
  end

  packages.sort

This logic could live in the jsbundling-rails gemspec (or whichever gem handles JS bundling), but as you can see, this could get complicated as OS maintainers change packages.

Because of this complexity and the unknowns for how useful and widely adopted this scheme could be, I propose keeping the target platform narrow to the ruby:slim image to see if it actually improves the deployment story.

Arguments for

As people add gems to their Rails applications, it would save them time from digging through deployment documentation if they can instead run a command that lists out the packages they need to run in the upcoming Rails Dockerfile.

Moving gem package dependencies from Rails into each gem would empower gem maintainers with a way to provide their users with a better Rails deployment experience. We’d hopefully see more PRs opened in the community to help gem maintainers automate and reason through package dependencies.

For more complex deployments, like including Python + Chrome to install a nodejs env in the image via multiple Docker build steps, we might see somebody from the Ruby community with knowledge on maintaining OS packages move this complexity into a package, which could further improve the build & deployment experience.

Discussion points

To be clear, I think the current approach of hardcoding packages into the current Rails Dockerfile generator is necessary because gems don’t yet have the metadata needed to declare package dependencies. An effort by the community would be needed to get this working properly and a lot of gems would have to be updated.

My knowledge of maintaining Linux packages is limited. I’d be curious to hear from people who have to maintain Linux systems what problems they’d anticipate from such a scheme. My hopes are that this proposal helps sysadmins work closer with Ruby developers via collaborations on what should go into the ruby_slim_docker_packages metadata.

The longer-term dream would be getting something like this working for more platforms. For example, a gem could specify Brew dependencies for macOS development environments. This feels like it would be too complicated for a place to start, but you can see how this might improve setting up a development environment on macOS if a list of brew packages can be pulled from Gem manifests.

The biggest question of all: who would find this useful and why?

1 Like

The overall idea of declaring packages seems reasonable, but a lot of the naming here is so specific it’d cause a massive proliferation of keys if/when others decide this is a good idea and want to add theirs.

ruby and “-slim” images other than the “-alpine” images are Debian, and the “worst case” if the package lists for them are applied for other Debian distro’s is that they might include too much or not be complete. If you try to install packages already on the system it will of course not be applied.

It feels to me like it’d both be cleaner and more likely to get adopted if you propose to Gems list dependencies by distribution, rather than by Docker image, as the differences in packaging and package names is by distribution.

Doing so, incidentally, if the metadata keys are cleanly defined (e.g. "packages_[distribution][:optional version spec]) would improve the situation not just for Docker builds for a single package, but would also allow e.g. rubygems or bundler to spit out a list of required packages or offer to install them. It’d also seem to (at least start to) solve the issue you mentioned of handling Brew and other distros.

Also, I’m curious about the “bullseye” detection - why not just do this, which isn’t specific to the -slim Ruby docker images:
bullseye = File.read("/etc/debian_version file").to_i >= 11 rescue false

But with respect to the above, if the suggested scheme were to allow e.g. “packages_debian:>=11” → list for Bullseye and newer, and “packages_debian:<=10” for older versions, you’d also need none of that logic.

1 Like

That would presume that the Dockerfile generation was done on the deployed machine.

I might have misunderstood - I assumed this function was being run in the Docker container as part of a build step. It looks like it’s being used to generate a Dockerfile without requiring a specific version? If so it’s a horribly brittle way of building images… Another reason why I avoid Rails, I guess.

In any case that’s a digression - using metadata to provide information on needed packages is a great idea, and done right it’d fix that issue as well.

Not yet — this proposal acts more like a version 0.1, which is why I’m very narrowly targeting the official Ruby docker image that Rails is using as a default for now. To do this “the right way” would probably require a new feature for RubyGems that could detect the platform/distribution, then resolve packages from metadata. I don’t think its ideal and would love to build something more comprehensive into RubyGems, but my preferred way of solving problems is to start small, validate the problem & respective solutions, scaling up, then rinse & repeat.

a Rails developer could easily point at Linux distribution packages and say the same thing, “package names are too brittle”. The problem with that approach is that when the finger pointing is done, the problem remains and is not improved. Ultimately what I think will improve this, even more than any proposed technical solution, is finding/building a small community of system administrators and Ruby developers who can work towards a solution in an empathetic and compassionate way that share the vision of improving the success rate of RubyGem installations on a broad set of platforms.

1 Like

To do this “the right way” would probably require a new feature for RubyGems that could detect the platform/distribution, then resolve packages from metadata.

cat /etc/os-release will show you metadata available on almost every modern Linux version, including in the ruby containers (both Debian and Alpine). For non-Linux, yes.

Automating it would require changes to Rubygems or Bundler, yes, but for the example you give, at least having a generic set of keys will make getting buy-in far easier.

E.g. I have a few gems, and a few I’ve contributed to. I have no interest in using Rails, and I don’t use the ruby “-slim” Docker images very often, but I do use Ubuntu, Debian, Alpine and would happily add metadata indicating dependencies when I become aware of them to make it easier.

But I’d be far more inclined to do so if there was sufficiently generic set of keys with some rough consensus.

a Rails developer could easily point at Linux distribution packages and say the same thing, “package names are too brittle”.

Well, as a Linux user since 27 years, and a Ruby user for 17, you’re absolutely right, package names (and the proliferation of formats) are too brittle. But note that the minor little issue I had with the proposal is merely with using the very specific key names. More generic key names and ability to specify Gem spec like version constraints in the keys would both expand utility and counter the argument against you gave.

The overall idea is great.

I’m not as familiar with Linux as you are, so some of these might be stupid questions and I’m about to learn a lot :smile:.

TIL! Here’s what I get for a distro:

root@63d67fea:~# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Are the ID and VERSION_ID keys formatted and reliable between distros? Those seem like the only keys needed to resolve something like debian-10, debian-11, etc. keys, much in the same way that platform keys for binaries are handled in RubyGems.

Assuming it makes sense to resolve keys like debian-10, etc. — are there forks of debian-10 distros that would have identical package names that don’t mean the same thing? If so, this could potentially throw a wrench in this idea if that concern is not and edge case.

How many years back does this work for “every modern Linux version”?

Assuming most of the stuff above works reliably, I could see adding support for a command that would work as follows:

$ bundle packages --distribution debian-10
# Lists out all of the packages for debian-10

Some people would want this flag so they could get a list of packages on another OS, say macOS, so they could hardcode it into a Dockerfile.

It makes sense to also be able to run the same command without the --distribution flag and get the packages for that OS if it can detect it.

A good starting place for this could be as a bundler plugin: Bundler: How to write a Bundler plugin

I need to think a bit more about this, but I think something like this could be done in a gemspec file.

if Gem.const_defined? :Distribution
  base_packages = "fizz buzz boo bar"

  case Gem:: Distribution.local
    in version: "debian", version: 10
      spec.metadata["package_dependencies"] = base_packages + " python_3_boo"
    in version: "debian", version: 11
      spec.metadata["package_dependencies"] = base_packages + " python"
    else
    end
end

The most tempting thing to do would be enhance Gem::Platform from:

irb(main)> Gem::Platform.local
=> #<Gem::Platform:0x00007fcfd9903e08 @cpu="x86_64", @os="linux", @version=nil>

to

irb(main)> Gem::Platform.local
=> #<Gem::Platform:0x00007fcfd9903e08 @cpu="x86_64", @os="linux", @version=nil @distribution="debian" @distribution_version="11">

I’m assuming @version should be the kernel version.

Here’s what I get on an Ubuntu machine:

$ cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Here’s the results for AlmaLinux:

AME="AlmaLinux"
VERSION="8.7 (Stone Smilodon)"
ID="almalinux"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.7"
PLATFORM_ID="platform:el8"
PRETTY_NAME="AlmaLinux 8.7 (Stone Smilodon)"
ANSI_COLOR="0;34"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:almalinux:almalinux:8::baseos"
HOME_URL="https://almalinux.org/"
DOCUMENTATION_URL="https://wiki.almalinux.org/"
BUG_REPORT_URL="https://bugs.almalinux.org/"

ALMALINUX_MANTISBT_PROJECT="AlmaLinux-8"
ALMALINUX_MANTISBT_PROJECT_VERSION="8.7"
REDHAT_SUPPORT_PRODUCT="AlmaLinux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.7"

Here’s the results for Alpine:

AME="Alpine Linux"
ID=alpine
VERSION_ID=3.17.1
PRETTY_NAME="Alpine Linux v3.17"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"
2 Likes

Are the ID and VERSION_ID keys formatted and reliable between distros? Those seem like the only keys needed to resolve something like debian-10, debian-11, etc. keys, much in the same way that platform keys for binaries are handled in RubyGems.

They should be. The spec is here:

https://www.freedesktop.org/software/systemd/man/os-release.html

It might be worth considering ID_LIKE as a fallback, with the caveat that it will break (e.g. Ubuntu’s ID_LIKE contains Debian, which will sometimes work, sometimes not), so maybe deal with that later.

In terms of ID and version, I’ve checked Ubuntu, Fedora, Debian, Alpine, OpenSuSE Leap (need to handle distro names w/-, e.g. “opensuse-leap”), Gentoo, Mageia (Mandriva fork), Devuan (this is significant, as it was forked from Debian specifically to yank out systemd, but still adopted os-release independently), arch (caveat: VERSION_ID=TEMPLATE_VERSION_ID in the current Docker image), Slackware

How many years back does this work for “every modern Linux version”?

Systemd removed support for non-os-release about a decade ago, claiming at the time that most distro’s had already added support for it. I think it’d be fine to just document that it’s (initially, at least) limited to detecting distro’s with os-release.

Some people would want this flag so they could get a list of packages on another OS, say macOS, so they could hardcode it into a Dockerfile.

Yeah, makes sense.

1 Like

I hacked some Ruby together at GitHub - bradgessler/bundler-package: Resolve package dependencies for apt, brew, etc. with Bundler that show how the distribution and version could be resolved on macOS and Linux. I’m still playing around with naming (it’s not great right now) and then I need to hack a few package resolutions into some gems to see what happens. It’s all pretty straight forward, but it did raise a few questions:

Could a distribution use a package manager other than its default?

I think the answer is “yes”. I know on macOS some people use Homebrew and others MacPorts. That would add complexity to such a scheme.

How would complex dependencies best be handled, if at all?

An example would be adding Node.JS to a project. For the sake of this conversation, I’ll refer to Rails with Node.js · Fly Docs. Let’s also say that the jsbundling-rails gem is trying to declare its package dependencies.

In any of the cases on that page, there’s much more than just returning the package nodejs in a list. None of the scenarios map cleanly 1:1 with a package.

I like to think this would put pressure on creating a package, but I don’t know enough about the Linux package maintenance community to know if that would actually happen.

Here’s a list:

My general impression is that there are several in popular use for redhat like operating systems, but for debian like operating systems it is rare to use anything but apt. That being said, some things are only installable by running curl commands, and other things may require adding a package repository and even downloading a gpg key before installing.

Here’s an example: How to use Puppeteer inside a Docker container - DEV Community

While that example is using node, the same instructions can be used for Ruby with puppeteer-ruby.

In theory the story is the same for node packages. They can have operating system dependencies. In practice the javascript used by Rails programs is intended to be run in the browser so it won’t make use of host dependent packages beyond what is needed to build or bundle the software.

Your first question really needs to be split into. Some distribution have used multiple package managers (e.g. Redhat/Fedora moved from yum to dnf) but with the same underlying package repositories. In that case it doesn’t really matter - if you write a Dockerfile for them you need to know which package manager the image uses.

It only matters if you have multiple package managers with different sets of packages available. In that case you effectively have to treat them as different “distributions”. In that case it may well be that people will need to tell you which “distribution” they’re using and Gem writers that want to provide package lists for them will need to figure out a naming.

To the second one, I don’t think there really is an easy solution to this. It’d be possible to somehow add metadata on repositories to add and that’d help, but I’d vote for solving one problem at a time, because solving the simple case will already be a massive step forward and you’re well on the way to solving that.

1 Like

Something I could do is have a default package manager per distribution, and leave it at that for now. Debian would be apt, macOS would be brew, etc. The gem metadata might need to be more specific about the distribution, version, and package manager. I’ll need to mull a little more on what that might look like on the implementation side.

I agree tackling this now might be an over-optimization, but I do want to at least give it enough thought to the point where I don’t paint myself into a corner when somebody runs into this problem.