As of today, starting a GPU machine is 1.6s faster per attached device.
That means less time waiting for your machine to be ready when scaling up or starting from cold. By focusing on saving on start times by the second, not only translates to reduced compute costs for our users, but also unlocks the autostart machines based on incoming requests pattern for more applications.
For the nerdy details read below
Before, machine’s init
process called nvidia-smi
, a tool provided by NVIDIA that among other things creates the character devices at /dev/nvidia*
.
It is important to pre-create the device files before handing over the control to the application’s process because apps dropping root privileges, those with a USER
statement in its Dockerfile, won’t be able to.
nvidia-smi
served us fine as a first version, but it incurs in a considerable extra time doing tasks we don’t need at this phase, scanning the bus and creating the devices shouldn’t take that long.
What if we can do better? nvidia-smi
is closed source, not a pretty job to reverse engineer it; the answer came from nvidia itself and its open-source nvidia-container-toolkit (yay OSS!). True to be told, the open source implementation isn’t complete but a search on NVIDIA forums revealed the rest of the details. With all the details in place, we were able to reimplement device creation in Rust, and now it takes less than a hundred milliseconds.
happy GPU hacking y’all!