-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(xunix): improve handling of gpu library mounts #129
Conversation
t.Run("EmptyHostUsrLibDir", func(t *testing.T) { | ||
t.Parallel() | ||
ctx, cancel := context.WithCancel(context.Background()) | ||
t.Cleanup(cancel) | ||
emptyUsrLibDir := t.TempDir() | ||
|
||
// Start the envbox container. | ||
ctID := startEnvboxCmd(ctx, t, integrationtest.UbuntuImage, "root", | ||
"-v", emptyUsrLibDir+":/var/coder/usr/lib", | ||
"--env", "CODER_ADD_GPU=true", | ||
"--env", "CODER_USR_LIB_DIR=/var/coder/usr/lib", | ||
"--runtime=nvidia", | ||
"--gpus=all", | ||
) | ||
|
||
ofs := outerFiles(ctx, t, ctID, "/usr/lib/x86_64-linux-gnu/libnv*") | ||
// Assert invariant: the outer container has the files we expect. | ||
require.NotEmpty(t, ofs, "failed to list outer container files") | ||
// Assert that expected files are available in the inner container. | ||
assertInnerFiles(ctx, t, ctID, "/usr/lib/x86_64-linux-gnu/libnv*", ofs...) | ||
assertInnerNvidiaSMI(ctx, t, ctID) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review: this tests that we can get by with no extra files in CODER_USR_LIB_DIR
if !gpuExtraRegex.MatchString(path) { | ||
return nil | ||
} | ||
|
||
if !sharedObjectRegex.MatchString(path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review: this makes the control flow a little easier to read; I accidentally removed this but it still performs an important task
gpuExtraRegex = regexp.MustCompile("(?i)(libgl|nvidia|vulkan|cuda)") | ||
gpuEnvRegex = regexp.MustCompile("(?i)nvidia") | ||
gpuMountRegex = regexp.MustCompile(`(?i)(nvidia|vulkan|cuda)`) | ||
gpuExtraRegex = regexp.MustCompile(`(?i)(libgl(e|sx|\.)|nvidia|vulkan|cuda)`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review: modified this regex to hopefully match the right things and not the wrong things
This PR fixes some issues Bjorn found when testing the changes in #127 as well as some other lingering issues I'd noticed:
We had been mounting in the mounts to e.g.
libnvidia-ml.so.550.90.07
but not the symlinks, as those do not show up as mounts:This modifies the behaviour of
xunix.GPUs
to also return the symlinks to those driver files, so that we also mount them inside the inner container.gpuExtraRegex
will search for files matching the expression(?i)(libgl|nvidia|vulkan|cuda)
. It turns out this will also matchlibglib-X.Y.so
. This is something we want to avoid, as it can causednf
to break when running Redhat-based distros in the inner container. Shout out to Bjorn for spotting this!NOTE: as these integration tests do not run automatically in CI, you will need to run them manually on a physical machine that has the NVidia container runtime installed: