parallel-processing-teaching-toolkit/04-GPU-accelerators/01-cuda/09-cooperative-groups at master · javierip/parallel-processing-teaching-toolkit

History

Name		Name	Last commit message	Last commit date
parent directory ..
CMakeLists.txt		CMakeLists.txt
README.md		README.md
exception.h		exception.h
helper_cuda.h		helper_cuda.h
helper_functions.h		helper_functions.h
helper_image.h		helper_image.h
helper_string.h		helper_string.h
helper_timer.h		helper_timer.h
main.cu		main.cu
run.sh		run.sh

README.md

About this example

This example shows the use of cooperative groups in CUDA. Based on CUDA Pro Tip: Optimized Filtering with Warp-Aggregated Atomics.

Requirements

CUDA Toolkit and Drivers.

Run

Open a terminal and type:

sh run.sh

Output

A typical output should look like this one.

[100%] Linking CXX executable application-CUDA
[100%] Built target application-CUDA
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "GeForce GTX 1060 3GB" with compute capability 6.1

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 448.65 GFlop/s, Time= 0.292 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.

Extra Resources

NVIDIA toolkit documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

09-cooperative-groups

09-cooperative-groups

README.md

About this example

Requirements

Run

Output

Extra Resources

Files

09-cooperative-groups

Directory actions

More options

Directory actions

More options

Latest commit

History

09-cooperative-groups

Folders and files

parent directory

README.md

About this example

Requirements

Run

Output

Extra Resources