Brick Library 0.1
Performance-portable stencil datalayout & codegen
|
Distributed Performance-portable Stencil Compuitation
Bricks is a data layout and code generation framework, enable performance-portable stencil across a multitude of architectures
Math kernel code and data manipulation code is shared across these platform, while achieving best-in-class performance on all platforms simultaneously. Especially, Brick layout is well suited to higher-order(bigger-wider) stencil computations.
Achieve consistent 1.9x-4.9x speedup across different architectures including Skylake, Intel Knights Landing, and NVidia P100 GPUs
Brick layout is flexible, allows flexible domain shapes and enables fast "ghost cell" data communication with MPI.
Efficient ghost zone exchange achieves up to 10x faster than state-of-the-art YASK and up to 600x compared to cray-mpich/7.7.10 MPI_Types.
Get code from Github, and start exploring the code documentation.
[1] Zhao, Tuowen, Samuel Williams, Mary Hall, and Hans Johansen. 2018. Delivering Performance-Portable Stencil Computations on CPUs and GPUs Using Bricks. In 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 59-70. DOI:https://doi.org/10.1109/P3HPC.2018.00009
[2] Tuowen Zhao, Protonu Basu, Samuel Williams, Mary Hall, and Hans Johansen. 2019. Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '19). Association for Computing Machinery, New York, NY, USA, Article 52, 1–44. DOI:https://doi.org/10.1145/3295500.3356210
[3] Tuowen Zhao, Mary Hall, Hans Johansen, and Samuel Williams. 2021. Improving communication by optimizing on-node data movement with data layout. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '21). Association for Computing Machinery, New York, NY, USA, 304–317. DOI:https://doi.org/10.1145/3437801.3441598