Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy_if failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered #350

Closed
taochenshh opened this issue May 8, 2021 · 5 comments

Comments

@taochenshh
Copy link

Hi,

I am installing MinkowskiEngine from source (commit ). When I ran the following code, I got cuda error: RuntimeError: copy_if failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered.

import MinkowskiEngine as ME
import torch
torch.manual_seed(0)
coordinates = torch.rand(8192,3) * 200
bcoords, bfeats = coordinates.cuda(), coordinates.cuda()
print(bcoords, bfeats)  # without print, it works fine... print seems to be triggering something
ME.SparseTensor(bfeats, bcoords)

I noticed this post, but I have tested with both pytorch 1.8.1+cu111 and pytorch 1.7.1+cu110, and both gives the same error.

Here are two configurations I tried:

==========System==========
Linux-4.14.224-llgrid-10ms-x86_64-with-debian-buster-sid
DISTRIB_ID=GridOS
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="GridOS 18.04.5"
3.7.10 (default, Feb 26 2021, 18:47:35) 
[GCC 7.3.0]
==========Pytorch==========
1.7.1+cu110
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 450.80.02
CUDA Version 11.0
VBIOS Version 88.00.7E.00.03
Image Version G500.0202.00.02
==========NVCC==========
/usr/local/pkg/cuda/cuda-11.0/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0
==========CC==========
/usr/bin/c++
c++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

==========MinkowskiEngine==========
0.5.3
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11000
CUDART version MinkowskiEngine is compiled: 11000
==========System==========
Linux-4.15.0-142-generic-x86_64-with-debian-buster-sid
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.5 LTS"
3.7.9 (default, Aug 31 2020, 12:42:55) 
[GCC 7.3.0]
==========Pytorch==========
1.8.1+cu111
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 465.19.01
CUDA Version 11.3
VBIOS Version 90.02.17.00.64
Image Version G001.0000.02.04
==========NVCC==========
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
==========CC==========
/usr/bin/c++
c++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

==========MinkowskiEngine==========
0.5.3
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11030
CUDART version MinkowskiEngine is compiled: 11030
@chrischoy
Copy link
Contributor

This is exactly the error you get for pytorch 1.8 + cuda 11. Could you please make sure you used pytorch 1.7 in a new conda environment, compile again and run the code again? I am pretty sure you might have misconfigured it to use 1.8 + cuda11.

@taochenshh
Copy link
Author

taochenshh commented May 8, 2021

Thanks for your quick reply. I just created a new conda environment and installed pytorch 1.7.1 with cuda 11.0. I also installed MinkowskiEngine from source with the following command:

conda create -n py37 python=3.7
conda activate py37
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch

git clone https://github.com/NVIDIA/MinkowskiEngine.git
cd MinkowskiEngine
pip install -U . -v --no-deps --install-option="--blas_include_dirs=${CONDA_PREFIX}/include" --install-option="--blas=openblas"

But I got the same error. Here is the environment configs:

==========System==========
Linux-4.14.224-llgrid-10ms-x86_64-with-debian-buster-sid
DISTRIB_ID=GridOS
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="GridOS 18.04.5"
3.7.10 (default, Feb 26 2021, 18:47:35) 
[GCC 7.3.0]
==========Pytorch==========
1.7.1
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 450.80.02
CUDA Version 11.0
VBIOS Version 88.00.7E.00.03
Image Version G500.0202.00.02
==========NVCC==========
/usr/local/pkg/cuda/cuda-11.0/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0
==========CC==========
/usr/bin/c++
c++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

==========MinkowskiEngine==========
0.5.3
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11000
CUDART version MinkowskiEngine is compiled: 11000

@taochenshh
Copy link
Author

I also created a new conda environment and installed pytorch via pip:

pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

And I got same error as well.

Here is the config:

==========System==========
Linux-4.14.224-llgrid-10ms-x86_64-with-debian-buster-sid
DISTRIB_ID=GridOS
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="GridOS 18.04.5"
3.7.10 (default, Feb 26 2021, 18:47:35) 
[GCC 7.3.0]
==========Pytorch==========
1.7.1+cu110
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 450.80.02
CUDA Version 11.0
VBIOS Version 88.00.7E.00.03
Image Version G500.0202.00.02
==========NVCC==========
/usr/local/pkg/cuda/cuda-11.0/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0
==========CC==========
/usr/bin/c++
c++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

==========MinkowskiEngine==========
0.5.3
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11000
CUDART version MinkowskiEngine is compiled: 11000

@taochenshh
Copy link
Author

RuntimeError                              Traceback (most recent call last)
<ipython-input-1-27eb35e5986d> in <module>
      5 bcoords, bfeats = coordinates.cuda(), coordinates.cuda()
      6 print(bcoords, bfeats)  # without print, it works fine... print seems to be triggering something
----> 7 ME.SparseTensor(bfeats, bcoords)

~/MinkowskiEngine/MinkowskiEngine/MinkowskiSparseTensor.py in __init__(self, features, coordinates, tensor_stride, coordinate_map_key, coordinate_manager, quantization_mode, allocator_type, minkowski_algorithm, requires_grad, device)
    270             )
    271             coordinates, features, coordinate_map_key = self.initialize_coordinates(
--> 272                 coordinates, features, coordinate_map_key
    273             )
    274         else:  # coordinate_map_key is not None:

~/MinkowskiEngine/MinkowskiEngine/MinkowskiSparseTensor.py in initialize_coordinates(self, coordinates, features, coordinate_map_key)
    298             coordinate_map_key,
    299             (unique_index, inverse_mapping),
--> 300         ) = self._manager.insert_and_map(coordinates, *coordinate_map_key.get_key())
    301         self.unique_index = unique_index.long()
    302         coordinates = coordinates[self.unique_index]

~/MinkowskiEngine/MinkowskiEngine/MinkowskiCoordinateManager.py in insert_and_map(self, coordinates, tensor_stride, string_id)
    177         """
    178         tensor_stride = convert_to_int_list(tensor_stride, self.D)
--> 179         return self._manager.insert_and_map(coordinates, tensor_stride, string_id)
    180 
    181     def insert_field(

RuntimeError: copy_if failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered

@taochenshh
Copy link
Author

After several installing and reinstalling of the environment and cuda, it seems that torch==1.7.1+cu110 with CUDA 11.0 can give the error, but torch==1.7.1+cu110 with CUDA 11.3 does not give an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants