Error when running the classification example #312

djlbet123 · 2021-02-05T01:10:05Z

Environment :
Driver Version: 460.32.03
CUDA Version: 11.1.105
Pytorch Version: 1.7.1

Install by ' pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps --install-option="--blas_include_dirs=${CONDA_PREFIX}/include" --install-option="--blas=openblas" '

python -m examples.modelnet40 (https://github.com/NVIDIA/MinkowskiEngine/blob/master/examples/modelnet40.py)

Error report:
Warning: This process will cache the entire voxelized ModelNet40 dataset, which will take up ~10G of memory.
INFO - 2021-02-05 08:57:28,495 - modelnet40 - Loading the subset train from ./ModelNet40 with 8871 files
INFO - 2021-02-05 08:57:28,496 - modelnet40 - Loading the subset val from ./ModelNet40 with 966 files
warnings.warn("To get the last learning rate computed by the scheduler, "
INFO - 2021-02-05 08:57:28,529 - modelnet40 - LR: [0.01]
** On entry to cusparseSpMM_bufferSize() parameter number 1 (handle) had an illegal value: bad initialization or already destroyed

RuntimeError: CUSPARSE_STATUS_INVALID_VALUE at /tmp/pip-req-build-4vnh0cz8/src/spmm.cu:249

chrischoy · 2021-02-05T01:49:44Z

The code runs perfectly fine on the latest MinkowskiEngine.

djlbet123 · 2021-02-05T02:24:28Z

Command list :
conda activate point
pip uninstall MinkowskiEngine (uninstall it)
git clone https://github.com/NVIDIA/MinkowskiEngine.git
cd MinkowskiEngine
python setup.py install --blas_include_dirs=${CONDA_PREFIX}/include --blas=openblas (install the latest one)
rm -r MinkowskiEngine
python -m examples.modelnet40 --batch_size 16

However, it return the same report:
Warning: This process will cache the entire voxelized ModelNet40 dataset, which will take up ~10G of memory.
INFO - 2021-02-05 10:22:33,448 - modelnet40 - Loading the subset train from ./ModelNet40 with 8871 files
INFO - 2021-02-05 10:22:33,463 - modelnet40 - Loading the subset val from ./ModelNet40 with 966 files
/home/summerriver/anaconda3/envs/point/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:448: UserWarning: To get the last learning rate computed by the scheduler, please use get_last_lr().
warnings.warn("To get the last learning rate computed by the scheduler, "
INFO - 2021-02-05 10:22:33,646 - modelnet40 - LR: [0.01]
** On entry to cusparseSpMM_bufferSize() parameter number 1 (handle) had an illegal value: bad initialization or already destroyed

Traceback (most recent call last):
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/summerriver/下载/code/MinkowskiEngine-master/MinkowskiEngine/examples/modelnet40.py", line 552, in
train(net, device, config)
File "/home/summerriver/下载/code/MinkowskiEngine-master/MinkowskiEngine/examples/modelnet40.py", line 504, in train
sout = net(sin)
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/summerriver/下载/code/MinkowskiEngine-master/MinkowskiEngine/examples/resnet.py", line 209, in forward
otensor = self.field_network(x)
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/site-packages/MinkowskiEngine-0.5.0-py3.8-linux-x86_64.egg/MinkowskiEngine/MinkowskiOps.py", line 256, in forward
return input.sparse()
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/site-packages/MinkowskiEngine-0.5.0-py3.8-linux-x86_64.egg/MinkowskiEngine/MinkowskiTensorField.py", line 296, in sparse
features = spmm.apply(inverse_mapping, cols, vals, size, self._F)
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/site-packages/MinkowskiEngine-0.5.0-py3.8-linux-x86_64.egg/MinkowskiEngine/sparse_matrix_functions.py", line 97, in forward
return spmm(
File "/home/summerriver/anaconda3/envs/point/lib/python3.8/site-packages/MinkowskiEngine-0.5.0-py3.8-linux-x86_64.egg/MinkowskiEngine/sparse_matrix_functions.py", line 50, in spmm
result, num_nonzero = MEB.coo_spmm_int32(
RuntimeError: CUSPARSE_STATUS_INVALID_VALUE at /home/summerriver/下载/code/MinkowskiEngine-master/MinkowskiEngine/src/spmm.cu:249

chrischoy · 2021-02-05T03:12:18Z

I tried to use the same setup with pytorch 1.7.1 with cuda 11.1, but still, the code runs without a problem.

Can you post the output of

wget -q https://raw.githubusercontent.com/NVIDIA/MinkowskiEngine/master/MinkowskiEngine/diagnostics.py ; python diagnostics.py

djlbet123 · 2021-02-05T03:33:27Z

==========System==========
Linux-5.8.0-40-generic-x86_64-with-glibc2.10
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.1 LTS"
3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0]
==========Pytorch==========
1.7.1
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 460.32.03
CUDA Version 11.2
VBIOS Version 90.17.1C.00.E0
Image Version G001.0000.02.04
==========NVCC==========
/usr/local/cuda-11.1/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
==========CC==========
/usr/bin/c++
c++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

==========MinkowskiEngine==========
0.5.0
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11010
CUDART version MinkowskiEngine is compiled: 11010

chrischoy · 2021-02-05T09:43:51Z

I used the same cuda versions but could not reproduce the error. I also tried pip and source but couldnt reproduce the error either.

Are you using a docker or any special setup?

djlbet123 · 2021-02-06T00:32:05Z

Thanks. I downloaded file from https://developer.download.nvidia.com/compute/cuda/11.2.0/local_installers/cuda_11.2.0_460.27.04_linux.run (similar to cuda 11.1).
Then, installing it without driver since driver had been installed.
Though, I changed cuda version to 11.2, it still returned the same error.

chrischoy · 2021-02-09T10:40:04Z

Closing the issue. The related issue #308 has been resolved on the latest master. Please feel free to open if this issue reappears.

chrischoy mentioned this issue Feb 5, 2021

CUSPARSE_STATUS_INVALID_VALUE when using features_at_coordinates method #308

Closed

chrischoy closed this as completed Feb 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when running the classification example #312

Error when running the classification example #312

djlbet123 commented Feb 5, 2021

chrischoy commented Feb 5, 2021

djlbet123 commented Feb 5, 2021

chrischoy commented Feb 5, 2021 •

edited

Loading

djlbet123 commented Feb 5, 2021

chrischoy commented Feb 5, 2021

djlbet123 commented Feb 6, 2021

chrischoy commented Feb 9, 2021 •

edited

Loading

Error when running the classification example #312

Error when running the classification example #312

Comments

djlbet123 commented Feb 5, 2021

chrischoy commented Feb 5, 2021

djlbet123 commented Feb 5, 2021

chrischoy commented Feb 5, 2021 • edited Loading

djlbet123 commented Feb 5, 2021

chrischoy commented Feb 5, 2021

djlbet123 commented Feb 6, 2021

chrischoy commented Feb 9, 2021 • edited Loading

chrischoy commented Feb 5, 2021 •

edited

Loading

chrischoy commented Feb 9, 2021 •

edited

Loading