-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
investigate issues with GPUs with redhat UBI #126
Comments
The nvidia container runtime will generally try to mount in GPU libraries in the correct location for the container image. For example:
The logic for this appears to be split across several internal packages
These modifications are performed at the container runtime layer -- the Docker daemon has no knowledge of the additional mounts. The actual logic of determining where to place the symlinks is performed in order of precedence: |
There's also similar logic in https://github.com/NVIDIA/libnvidia-container/blob/v1.15.0/src/nvc_container.c#L152-L165 |
I'm happy either way. We could either match the nvidia behavior for determining whether to put them in It's not immediately clear from the code you linked how they detect which one to use, but if it's easy to copy the check they're doing then we should do the same thing. If it's difficult then we should just add the env var. edit: actually the last comment you posted wasn't loaded in for me but that seems very simple to duplicate in our Go code? |
Yeah I'm leaning towards the last one as well! |
One thing I suspect is not very intuititive is determining the correct value for |
With
Notes:
|
Update: looks like this approach can somehow break dnf:
|
Tested with
Succesfully ran MNIST example from https://github.com/pytorch/examples in workspace. |
Follow-up from #111
There are reports of issues using GPUs using the Red Hat 9 UBI as an internal image.
Use same setup as previous:
The text was updated successfully, but these errors were encountered: