{"title":"An MPI interface for application and hardware aware cartesian topology optimization","authors":"Christoph Niethammer, R. Rabenseifner","doi":"10.1145/3343211.3343217","DOIUrl":null,"url":null,"abstract":"Many scientific applications perform computations on a Cartesian grid. The common approach for the parallelization of these applications with MPI is domain decomposition. To help developers with the mapping of MPI processes to subdomains, the MPI standard provides the concept of process topologies. However, the current interface causes problems and requires too much care in its usage: MPI_Dims_create does not take into account the application topology and most implementations of MPI_Cart_create do not consider the underlying network topology and node architecture. To overcome these shortcomings, we defined a new interface that includes application-aware weights to address the communication needs of grid-based applications. The new interface provides a hardware-aware factorization of the processes together with an optimized process mapping onto the underlying hardware resources. The paper describes the underlying implementation, which uses a new multi-level factorization and decomposition approach minimizing slow inter-node communication. Benchmark results show the significant performance gains on multi node NUMA systems.","PeriodicalId":314904,"journal":{"name":"Proceedings of the 26th European MPI Users' Group Meeting","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th European MPI Users' Group Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3343211.3343217","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Many scientific applications perform computations on a Cartesian grid. The common approach for the parallelization of these applications with MPI is domain decomposition. To help developers with the mapping of MPI processes to subdomains, the MPI standard provides the concept of process topologies. However, the current interface causes problems and requires too much care in its usage: MPI_Dims_create does not take into account the application topology and most implementations of MPI_Cart_create do not consider the underlying network topology and node architecture. To overcome these shortcomings, we defined a new interface that includes application-aware weights to address the communication needs of grid-based applications. The new interface provides a hardware-aware factorization of the processes together with an optimized process mapping onto the underlying hardware resources. The paper describes the underlying implementation, which uses a new multi-level factorization and decomposition approach minimizing slow inter-node communication. Benchmark results show the significant performance gains on multi node NUMA systems.