Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running the classification example #312

Closed
djlbet123 opened this issue Feb 5, 2021 · 7 comments
Closed

Error when running the classification example #312

djlbet123 opened this issue Feb 5, 2021 · 7 comments

Comments

@djlbet123
Copy link

Environment :
Driver Version: 460.32.03
CUDA Version: 11.1.105
Pytorch Version: 1.7.1

Install by ' pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps --install-option="--blas_include_dirs=${CONDA_PREFIX}/include" --install-option="--blas=openblas" '

python -m examples.modelnet40 (https://github.com/NVIDIA/MinkowskiEngine/blob/master/examples/modelnet40.py)

Error report:
Warning: This process will cache the entire voxelized ModelNet40 dataset, which will take up ~10G of memory.
INFO - 2021-02-05 08:57:28,495 - modelnet40 - Loading the subset train from ./ModelNet40 with 8871 files
INFO - 2021-02-05 08:57:28,496 - modelnet40 - Loading the subset val from ./ModelNet40 with 966 files
warnings.warn("To get the last learning rate computed by the scheduler, "
INFO - 2021-02-05 08:57:28,529 - modelnet40 - LR: [0.01]
** On entry to cusparseSpMM_bufferSize() parameter number 1 (handle) had an illegal value: bad initialization or already destroyed

RuntimeError: CUSPARSE_STATUS_INVALID_VALUE at /tmp/pip-req-build-4vnh0cz8/src/spmm.cu:249

@chrischoy
Copy link
Contributor

The code runs perfectly fine on the latest MinkowskiEngine.

@djlbet123
Copy link
Author

Command list :
conda activate point
pip uninstall MinkowskiEngine (uninstall it)
git clone https://github.com/NVIDIA/MinkowskiEngine.git
cd MinkowskiEngine
python setup.py install --blas_include_dirs=${CONDA_PREFIX}/include --blas=openblas (install the latest one)
rm -r MinkowskiEngine
python -m examples.modelnet40 --batch_size 16

However, it return the same report:
Warning: This process will cache the entire voxelized ModelNet40 dataset, which will take up ~10G of memory.
INFO - 2021-02-05 10:22:33,448 - modelnet40 - Loading the subset train from ./ModelNet40 with 8871 files
INFO - 2021-02-05 10:22:33,463 - modelnet40 - Loading the subset val from ./ModelNet40 with 966 files
/home/summerriver/anaconda3/envs/point/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:448: UserWarning: To get the last learning rate computed by the scheduler, please use get_last_lr().
warnings.warn("To get the last learning rate computed by the scheduler, "
INFO - 2021-02-05 10:22:33,646 - modelnet40 - LR: [0.01]
** On entry to cusparseSpMM_bufferSize() parameter number 1 (handle) had an illegal value: bad initialization or already destroyed

Traceback (most recent call last):
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/summerriver/下载/code/MinkowskiEngine-master/MinkowskiEngine/examples/modelnet40.py", line 552, in
train(net, device, config)
File "/home/summerriver/下载/code/MinkowskiEngine-master/MinkowskiEngine/examples/modelnet40.py", line 504, in train
sout = net(sin)
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/summerriver/下载/code/MinkowskiEngine-master/MinkowskiEngine/examples/resnet.py", line 209, in forward
otensor = self.field_network(x)
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/site-packages/MinkowskiEngine-0.5.0-py3.8-linux-x86_64.egg/MinkowskiEngine/MinkowskiOps.py", line 256, in forward
return input.sparse()
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/site-packages/MinkowskiEngine-0.5.0-py3.8-linux-x86_64.egg/MinkowskiEngine/MinkowskiTensorField.py", line 296, in sparse
features = spmm.apply(inverse_mapping, cols, vals, size, self._F)
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/site-packages/MinkowskiEngine-0.5.0-py3.8-linux-x86_64.egg/MinkowskiEngine/sparse_matrix_functions.py", line 97, in forward
return spmm(
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/site-packages/MinkowskiEngine-0.5.0-py3.8-linux-x86_64.egg/MinkowskiEngine/sparse_matrix_functions.py", line 50, in spmm
result, num_nonzero = MEB.coo_spmm_int32(
RuntimeError: CUSPARSE_STATUS_INVALID_VALUE at /home/summerriver/下载/code/MinkowskiEngine-master/MinkowskiEngine/src/spmm.cu:249

@chrischoy
Copy link
Contributor

chrischoy commented Feb 5, 2021

I tried to use the same setup with pytorch 1.7.1 with cuda 11.1, but still, the code runs without a problem.

Can you post the output of

wget -q https://raw.githubusercontent.com/NVIDIA/MinkowskiEngine/master/MinkowskiEngine/diagnostics.py ; python diagnostics.py

@djlbet123
Copy link
Author

==========System==========
Linux-5.8.0-40-generic-x86_64-with-glibc2.10
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.1 LTS"
3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0]
==========Pytorch==========
1.7.1
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 460.32.03
CUDA Version 11.2
VBIOS Version 90.17.1C.00.E0
Image Version G001.0000.02.04
==========NVCC==========
/usr/local/cuda-11.1/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
==========CC==========
/usr/bin/c++
c++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

==========MinkowskiEngine==========
0.5.0
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11010
CUDART version MinkowskiEngine is compiled: 11010

@chrischoy
Copy link
Contributor

I used the same cuda versions but could not reproduce the error. I also tried pip and source but couldnt reproduce the error either.

Are you using a docker or any special setup?

@djlbet123
Copy link
Author

Thanks. I downloaded file from https://developer.download.nvidia.com/compute/cuda/11.2.0/local_installers/cuda_11.2.0_460.27.04_linux.run (similar to cuda 11.1).
Then, installing it without driver since driver had been installed.
Though, I changed cuda version to 11.2, it still returned the same error.

@chrischoy
Copy link
Contributor

chrischoy commented Feb 9, 2021

Closing the issue. The related issue #308 has been resolved on the latest master. Please feel free to open if this issue reappears.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants