-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InstanceNorm v0.5: cudaErrorMisalignedAddress #246
Comments
To provide a bit more information: Python: 3.7.9 Unit test of normalization runs through normally (with different batch sizes. feat dimension and even when adding a conv layer before normalization). It is actually not putting the model in the eval mode that matters but the batch size, with batch size 1 it runs ok with more than 1 it crashes. I have reduced the number of point to a very small number it still fails. On CPU it works ok with all batch sizes. With different datasets sometimes it crashes at the first IN layer sometimes at the second (Architecture is similar to FCGF Encoder) changing the kernel size of the conv layer before normalization to 1 does not help. I have tried both ME.utils.batched_coordinates() and ME.utils.sparse_collate() to generate the batch coordinates and features. All features are float32 data type. Thanks again for you help Best |
Hmm I'm having difficulty replicating this error on 10.1.243, 10.2, 11.0. Could you post a short script that replicates this error? |
Temporarily disable the global pooling kernel on 010a39c. To use the native cuda calls, use |
Thanks. However, after using the native cuda call, it throws a new error.
|
This is the same error. Could you post a minimal reproducible code? |
Replace all ME.MinkowskiBatchNorm to ME.MinkowskiInstanceNorm in examples\resnet.py
to
Reproduce the error. |
No, it doesn't reproduce the error on
@Andy97 could you post the output of
|
I have the same error on the following environments in docker container.
and
|
Hi Chris, sorry I have slightly forgotten about this issue. Below is the info from one computer, I observe the same on at least one more pc (if you need, I can get also the diagnostics from that one). ==========System========== ==========MinkowskiEngine========== |
Hi Chris,
I have tried updating ME to 0.5 and I have some problems when using the InstanceNorm layer (see error bellow), if I replace InstanceNorm with BarchNorm it works without a problem. The same model with InstanceNorm works normally when using ME v0.4.3.
The error only occurs during training, if the model is put in the eval mode with torch.no_grad() it works without a problem and returns the same results as the v0.4.3. Using the v0.5 makes the inference 2 times faster 👍
Thanks for your help.
Best
Zan
The text was updated successfully, but these errors were encountered: