InstanceNorm v0.5: cudaErrorMisalignedAddress #246

zgojcic · 2020-10-19T14:57:54Z

Hi Chris,

I have tried updating ME to 0.5 and I have some problems when using the InstanceNorm layer (see error bellow), if I replace InstanceNorm with BarchNorm it works without a problem. The same model with InstanceNorm works normally when using ME v0.4.3.

The error only occurs during training, if the model is put in the eval mode with torch.no_grad() it works without a problem and returns the same results as the v0.4.3. Using the v0.5 makes the inference 2 times faster 👍

RuntimeError:  misaligned address at /home/zgojcic/Documents/holistic_scene_flow/class_net_me_05/MinkowskiEngine/src/pooling_avg_kernel.cu:299
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  CUDA free failed: cudaErrorMisalignedAddress: misaligned address

Thanks for your help.

Best
Zan

The text was updated successfully, but these errors were encountered:

zgojcic · 2020-10-19T19:36:50Z

To provide a bit more information:

Python: 3.7.9
Cuda: 10.1.243
torch: 1.6.0

Unit test of normalization runs through normally (with different batch sizes. feat dimension and even when adding a conv layer before normalization).

It is actually not putting the model in the eval mode that matters but the batch size, with batch size 1 it runs ok with more than 1 it crashes. I have reduced the number of point to a very small number it still fails. On CPU it works ok with all batch sizes.

With different datasets sometimes it crashes at the first IN layer sometimes at the second (Architecture is similar to FCGF Encoder) changing the kernel size of the conv layer before normalization to 1 does not help.

I have tried both ME.utils.batched_coordinates() and ME.utils.sparse_collate() to generate the batch coordinates and features.

All features are float32 data type.

Thanks again for you help

Best
Zan

chrischoy · 2020-11-20T05:41:04Z

Hmm I'm having difficulty replicating this error on 10.1.243, 10.2, 11.0. Could you post a short script that replicates this error?

chrischoy · 2020-11-20T05:52:22Z

Temporarily disable the global pooling kernel on 010a39c. To use the native cuda calls, use MinkowskiGlobalAvgPooling(mode=ME.PoolingMode.GLOBAL_AVG_POOLING_KERNEL).

lshiwjx · 2020-12-06T06:57:22Z

Thanks.

However, after using the native cuda call, it throws a new error.

RuntimeError:  misaligned address at /tmp/pip-req-build-b47llihh/src/pooling_avg_kernel.cu:299
terminate called after throwing an instance of 'thrust::system::system_error'
 what():  CUDA free failed: cudaErrorMisalignedAddress: misaligned address

chrischoy · 2020-12-15T16:11:55Z

This is the same error. Could you post a minimal reproducible code?

Andy97 · 2021-01-26T06:14:28Z

Replace all ME.MinkowskiBatchNorm to ME.MinkowskiInstanceNorm in examples\resnet.py
And modify line 86 function

    def weight_initialization(self):
        for m in self.modules():
            if isinstance(m, ME.MinkowskiConvolution):
                ME.utils.kaiming_normal_(m.kernel, mode="fan_out", nonlinearity="relu")

            if isinstance(m, ME.MinkowskiBatchNorm):
                nn.init.constant_(m.bn.weight, 1)
                nn.init.constant_(m.bn.bias, 0)

to

    def weight_initialization(self):
        for m in self.modules():
            if isinstance(m, ME.MinkowskiConvolution):
                ME.utils.kaiming_normal_(m.kernel, mode="fan_out", nonlinearity="relu")

            if isinstance(m, ME.MinkowskiInstanceNorm):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

Reproduce the error.

chrischoy · 2021-01-31T09:52:54Z

No, it doesn't reproduce the error on

==========System==========
Linux-5.4.0-65-generic-x86_64-with-glibc2.10
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.1 LTS"
3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0]
==========Pytorch==========
1.7.0
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 450.102.04
CUDA Version 11.0
VBIOS Version 90.02.2E.00.0C
Image Version G001.0000.02.04
==========NVCC==========
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
==========CC==========
/usr/bin/c++
c++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

==========MinkowskiEngine==========
0.5.0
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11010
CUDART version MinkowskiEngine is compiled: 11010

@Andy97 could you post the output of

import MinkowskiEngine as ME
ME.print_diagnostics()

chrockey · 2021-02-01T06:08:48Z

I have the same error on the following environments in docker container.

==========System==========
Linux-5.4.0-53-generic-x86_64-with-debian-buster-sid
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.5 LTS"
3.7.9 (default, Aug 31 2020, 12:42:55) 
[GCC 7.3.0]
==========Pytorch==========
1.7.1
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 450.102.04
CUDA Version 11.0
VBIOS Version 90.02.42.00.0F
Image Version G001.0000.02.04
==========NVCC==========
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
==========CC==========
/usr/bin/c++
c++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

==========MinkowskiEngine==========
0.5.0
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 10020
CUDART version MinkowskiEngine is compiled: 10020

and

==========System==========
Linux-5.4.0-53-generic-x86_64-with-glibc2.10
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.1 LTS"
3.8.5 (default, Sep  4 2020, 07:30:14)
[GCC 7.3.0]
==========Pytorch==========
1.7.1
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 450.102.04
CUDA Version 11.0
VBIOS Version 90.02.42.00.0F
Image Version G001.0000.02.04
==========NVCC==========
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
==========CC==========
/usr/bin/c++
c++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

==========MinkowskiEngine==========
0.5.0
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11010
CUDART version MinkowskiEngine is compiled: 11010

zgojcic · 2021-02-01T12:49:05Z

Hi Chris, sorry I have slightly forgotten about this issue. Below is the info from one computer, I observe the same on at least one more pc (if you need, I can get also the diagnostics from that one).

==========System==========
Linux-4.15.0-132-generic-x86_64-with-debian-stretch-sid
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"
3.6.12 |Anaconda, Inc.| (default, Sep 8 2020, 23:10:56)
[GCC 7.3.0]
==========Pytorch==========
1.7.1
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 418.87.00
CUDA Version 10.1
VBIOS Version 86.04.17.00.01
Image Version G001.0000.01.03
==========NVCC==========
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
==========CC==========
CC=g++-7
/usr/bin/g++-7
g++-7 (Ubuntu 7.5.0-3ubuntu1~16.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

==========MinkowskiEngine==========
0.5.0
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 10010
CUDART version MinkowskiEngine is compiled: 10010

chrischoy added bug Something isn't working v0.5 labels Oct 30, 2020

chrischoy mentioned this issue Nov 20, 2020

0.5 Global average pooling problem #263

Closed

chrischoy added a commit that referenced this issue Feb 3, 2021

use gpool pytorch index for instance norm (#246)

ae13226

chrischoy closed this as completed in 7d02dba Feb 4, 2021

chrischoy added a commit that referenced this issue Feb 4, 2021

avg pooling for CUDA 10 (fix #246)

7ccf01b

Tanazzah pushed a commit to Tanazzah/MinkowskiEngine that referenced this issue Feb 9, 2024

use gpool pytorch index for instance norm (NVIDIA#246)

669e356

Tanazzah pushed a commit to Tanazzah/MinkowskiEngine that referenced this issue Feb 9, 2024

avg pooling for CUDA 10 (fix NVIDIA#246)

02e5cb3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InstanceNorm v0.5: cudaErrorMisalignedAddress #246

InstanceNorm v0.5: cudaErrorMisalignedAddress #246

zgojcic commented Oct 19, 2020

zgojcic commented Oct 19, 2020

chrischoy commented Nov 20, 2020

chrischoy commented Nov 20, 2020

lshiwjx commented Dec 6, 2020

chrischoy commented Dec 15, 2020

Andy97 commented Jan 26, 2021

chrischoy commented Jan 31, 2021 •

edited

Loading

chrockey commented Feb 1, 2021 •

edited

Loading

zgojcic commented Feb 1, 2021

InstanceNorm v0.5: cudaErrorMisalignedAddress #246

InstanceNorm v0.5: cudaErrorMisalignedAddress #246

Comments

zgojcic commented Oct 19, 2020

zgojcic commented Oct 19, 2020

chrischoy commented Nov 20, 2020

chrischoy commented Nov 20, 2020

lshiwjx commented Dec 6, 2020

chrischoy commented Dec 15, 2020

Andy97 commented Jan 26, 2021

chrischoy commented Jan 31, 2021 • edited Loading

chrockey commented Feb 1, 2021 • edited Loading

zgojcic commented Feb 1, 2021

chrischoy commented Jan 31, 2021 •

edited

Loading

chrockey commented Feb 1, 2021 •

edited

Loading