{"title":"Alternative quadrant representations with Morton index and AVX2 vectorization for AMR algorithms within the p4est software library","authors":"Mikhail KirilinINS, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany, Carsten BursteddeINS, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany","doi":"arxiv-2308.13615","DOIUrl":null,"url":null,"abstract":"We present a technical enhancement within the p4est software for parallel\nadaptive mesh refinement. In p4est primitives are stored as octants in three\nand quadrants in two dimensions. While, classically, they are encoded by the\nnative approach using its spatial and refinement level, any other\nmathematically equivalent encoding might be used instead. Recognizing this, we add two alternative representations to the classical,\nexplicit version, based on a long monotonic index and 128-bit AVX quad\nintegers, respectively. The first one requires changes in logic for low-level\nquadrant manipulating algorithms, while the other exploits data level\nparallelism and requires algorithms to be adapted to SIMD instructions. The\nresultant algorithms and data structures lead to higher performance and lesser\nmemory usage in comparison with the standard baseline. We benchmark selected algorithms on a cluster with two Intel(R) Xeon(R) Gold\n6130 Skylake family CPUs per node, which provides support for AVX2 extensions,\n192 GB RAM per node, and up to 512 computational cores in total.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"33 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Mathematical Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2308.13615","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We present a technical enhancement within the p4est software for parallel
adaptive mesh refinement. In p4est primitives are stored as octants in three
and quadrants in two dimensions. While, classically, they are encoded by the
native approach using its spatial and refinement level, any other
mathematically equivalent encoding might be used instead. Recognizing this, we add two alternative representations to the classical,
explicit version, based on a long monotonic index and 128-bit AVX quad
integers, respectively. The first one requires changes in logic for low-level
quadrant manipulating algorithms, while the other exploits data level
parallelism and requires algorithms to be adapted to SIMD instructions. The
resultant algorithms and data structures lead to higher performance and lesser
memory usage in comparison with the standard baseline. We benchmark selected algorithms on a cluster with two Intel(R) Xeon(R) Gold
6130 Skylake family CPUs per node, which provides support for AVX2 extensions,
192 GB RAM per node, and up to 512 computational cores in total.