屋顶线缩放轨迹:一种并行应用和建筑性能分析方法

2018 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2018-07-01 DOI:10.1109/HPCS.2018.00065

K. Ibrahim, Samuel Williams, L. Oliker

{"title":"屋顶线缩放轨迹:一种并行应用和建筑性能分析方法","authors":"K. Ibrahim, Samuel Williams, L. Oliker","doi":"10.1109/HPCS.2018.00065","DOIUrl":null,"url":null,"abstract":"The end of Dennard scaling signaled a shift in HPC supercomputer architectures from systems built from single- core processor architectures to systems built from multicore and eventually manycore architectures. This transition substantially complicated performance optimization and analysis as new programming models were created, new scaling methodologies deployed, and on-chip contention became a bottleneck to performance. Existing distributed memory performance models like logP and logGP were unable to capture this contention. The Roofline model was created to address this contention and its interplay with locality. However, to date, the Roofline model has focused on full-node concurrency. In this paper, we extend the Roofline model to capture the effects of concurrency on data locality and on-chip contention. We demonstrate the value of this new technique by evaluating the NAS parallel benchmarks on both multicore and manycore architectures under both strong-and weak-scaling regimes. In order to quantify the interplay between programming model and locality, we evaluate scaling under both the OpenMP and flat MPI programming models.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Roofline Scaling Trajectories: A Method for Parallel Application and Architectural Performance Analysis\",\"authors\":\"K. Ibrahim, Samuel Williams, L. Oliker\",\"doi\":\"10.1109/HPCS.2018.00065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The end of Dennard scaling signaled a shift in HPC supercomputer architectures from systems built from single- core processor architectures to systems built from multicore and eventually manycore architectures. This transition substantially complicated performance optimization and analysis as new programming models were created, new scaling methodologies deployed, and on-chip contention became a bottleneck to performance. Existing distributed memory performance models like logP and logGP were unable to capture this contention. The Roofline model was created to address this contention and its interplay with locality. However, to date, the Roofline model has focused on full-node concurrency. In this paper, we extend the Roofline model to capture the effects of concurrency on data locality and on-chip contention. We demonstrate the value of this new technique by evaluating the NAS parallel benchmarks on both multicore and manycore architectures under both strong-and weak-scaling regimes. In order to quantify the interplay between programming model and locality, we evaluate scaling under both the OpenMP and flat MPI programming models.\",\"PeriodicalId\":308138,\"journal\":{\"name\":\"2018 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCS.2018.00065\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS.2018.00065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

Dennard扩展的终结标志着HPC超级计算机体系结构的转变，从单核处理器体系结构到多核体系结构，最终是多核体系结构。由于创建了新的编程模型，部署了新的扩展方法，这种转变使性能优化和分析变得非常复杂，芯片上的争用成为性能的瓶颈。现有的分布式内存性能模型(如logP和logGP)无法捕捉到这种争用。创建rooline模型是为了解决这一争论及其与局域性的相互作用。然而，到目前为止，rooline模型主要关注全节点并发性。在本文中，我们扩展了rooline模型来捕捉并发对数据局部性和片上争用的影响。我们通过评估多核和多核架构在强扩展和弱扩展机制下的NAS并行基准来展示这种新技术的价值。为了量化编程模型和局部性之间的相互作用，我们评估了OpenMP和平面MPI编程模型下的可伸缩性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Roofline Scaling Trajectories: A Method for Parallel Application and Architectural Performance Analysis

The end of Dennard scaling signaled a shift in HPC supercomputer architectures from systems built from single- core processor architectures to systems built from multicore and eventually manycore architectures. This transition substantially complicated performance optimization and analysis as new programming models were created, new scaling methodologies deployed, and on-chip contention became a bottleneck to performance. Existing distributed memory performance models like logP and logGP were unable to capture this contention. The Roofline model was created to address this contention and its interplay with locality. However, to date, the Roofline model has focused on full-node concurrency. In this paper, we extend the Roofline model to capture the effects of concurrency on data locality and on-chip contention. We demonstrate the value of this new technique by evaluating the NAS parallel benchmarks on both multicore and manycore architectures under both strong-and weak-scaling regimes. In order to quantify the interplay between programming model and locality, we evaluate scaling under both the OpenMP and flat MPI programming models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 International Conference on High Performance Computing & Simulation (HPCS)

自引率

0.00%

发文量