自适应网格细化应用的机器和应用感知分区

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing Pub Date : 2017-06-26 DOI:10.1145/3078597.3078610

Milinda Fernando, Dmitry Duplyakin, H. Sundar

{"title":"自适应网格细化应用的机器和应用感知分区","authors":"Milinda Fernando, Dmitry Duplyakin, H. Sundar","doi":"10.1145/3078597.3078610","DOIUrl":null,"url":null,"abstract":"Load balancing and partitioning are critical when it comes to parallel computations. Popular partitioning strategies based on space filling curves focus on equally dividing work. The partitions produced are independent of the architecture or the application. Given the ever-increasing relative cost of data movement and increasing heterogeneity of our architectures, it is no longer sufficient to only consider an equal partitioning of work. Minimizing communication costs are equally if not more important. Our hypothesis is that an unequal partitioning that minimizes communication costs significantly can scale and perform better than conventional equal-work partitioning schemes. This tradeoff is dependent on the architecture as well as the application. We validate our hypothesis in the context of a finite-element computation utilizing adaptive mesh-refinement. Our central contribution is a new partitioning scheme that minimizes the overall runtime of subsequent computations by performing architecture and application-aware non-uniform work assignment in order to decrease time to solution, primarily by minimizing data-movement. We evaluate our algorithm by comparing it against standard space-filling curve based partitioning algorithms and observing time-to-solution as well as energy-to-solution for solving Finite Element computations on adaptively refined meshes. We demonstrate excellent scalability of our new partition algorithm up to $262,144$ cores on ORNL's Titan and demonstrate that the proposed partitioning scheme reduces overall energy as well as time-to-solution for application codes by up to 22.0%","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Machine and Application Aware Partitioning for Adaptive Mesh Refinement Applications\",\"authors\":\"Milinda Fernando, Dmitry Duplyakin, H. Sundar\",\"doi\":\"10.1145/3078597.3078610\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Load balancing and partitioning are critical when it comes to parallel computations. Popular partitioning strategies based on space filling curves focus on equally dividing work. The partitions produced are independent of the architecture or the application. Given the ever-increasing relative cost of data movement and increasing heterogeneity of our architectures, it is no longer sufficient to only consider an equal partitioning of work. Minimizing communication costs are equally if not more important. Our hypothesis is that an unequal partitioning that minimizes communication costs significantly can scale and perform better than conventional equal-work partitioning schemes. This tradeoff is dependent on the architecture as well as the application. We validate our hypothesis in the context of a finite-element computation utilizing adaptive mesh-refinement. Our central contribution is a new partitioning scheme that minimizes the overall runtime of subsequent computations by performing architecture and application-aware non-uniform work assignment in order to decrease time to solution, primarily by minimizing data-movement. We evaluate our algorithm by comparing it against standard space-filling curve based partitioning algorithms and observing time-to-solution as well as energy-to-solution for solving Finite Element computations on adaptively refined meshes. We demonstrate excellent scalability of our new partition algorithm up to $262,144$ cores on ORNL's Titan and demonstrate that the proposed partitioning scheme reduces overall energy as well as time-to-solution for application codes by up to 22.0%\",\"PeriodicalId\":436194,\"journal\":{\"name\":\"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3078597.3078610\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3078597.3078610","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

当涉及到并行计算时，负载平衡和分区至关重要。流行的基于空间填充曲线的划分策略侧重于平均划分工作。生成的分区独立于体系结构或应用程序。考虑到数据移动的相对成本不断增加，以及我们的体系结构的异构性不断增加，仅仅考虑相等的工作划分已经不够了。最小化沟通成本即使不是更重要，也是同等重要的。我们的假设是，显著最小化通信成本的不相等分区可以扩展，并且比传统的等功分区方案性能更好。这种权衡取决于体系结构和应用程序。我们在利用自适应网格细化的有限元计算的背景下验证了我们的假设。我们的主要贡献是一个新的分区方案，它通过执行体系结构和应用程序感知的非统一工作分配来最小化后续计算的总体运行时间，从而减少解决方案的时间，主要是通过最小化数据移动。我们通过将其与标准的基于空间填充曲线的划分算法进行比较来评估我们的算法，并观察在自适应精细网格上求解有限元计算的时间到解以及能量到解。我们在ORNL的Titan上展示了我们的新分区算法的出色可扩展性，最高可达262,144美元核，并证明了所提出的分区方案将应用程序代码的总能量和解决时间降低了22.0%

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Machine and Application Aware Partitioning for Adaptive Mesh Refinement Applications

Load balancing and partitioning are critical when it comes to parallel computations. Popular partitioning strategies based on space filling curves focus on equally dividing work. The partitions produced are independent of the architecture or the application. Given the ever-increasing relative cost of data movement and increasing heterogeneity of our architectures, it is no longer sufficient to only consider an equal partitioning of work. Minimizing communication costs are equally if not more important. Our hypothesis is that an unequal partitioning that minimizes communication costs significantly can scale and perform better than conventional equal-work partitioning schemes. This tradeoff is dependent on the architecture as well as the application. We validate our hypothesis in the context of a finite-element computation utilizing adaptive mesh-refinement. Our central contribution is a new partitioning scheme that minimizes the overall runtime of subsequent computations by performing architecture and application-aware non-uniform work assignment in order to decrease time to solution, primarily by minimizing data-movement. We evaluate our algorithm by comparing it against standard space-filling curve based partitioning algorithms and observing time-to-solution as well as energy-to-solution for solving Finite Element computations on adaptively refined meshes. We demonstrate excellent scalability of our new partition algorithm up to $262,144$ cores on ORNL's Titan and demonstrate that the proposed partitioning scheme reduces overall energy as well as time-to-solution for application codes by up to 22.0%

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

自引率

0.00%

发文量