Application performance characterization and analysis on Blue Gene/Q

B. Walkup
{"title":"Application performance characterization and analysis on Blue Gene/Q","authors":"B. Walkup","doi":"10.1109/SC.Companion.2012.358","DOIUrl":null,"url":null,"abstract":"This article consists of a collection of slides from the author's conference presentation. The author concludes that The Blue Gene/Q design, low-power simple cores, four hardware threads per core, resu lts in high instruction throughput, and thus exceptional power efficiency for applications. Can effectively fill in pipeline stalls and hide latencies in the memory subsystem. The consequence is low performance per thread, so a high degree of parallelization is required for high application performance. Traditional programming methods (MPI, OpenMP, Pthreads) hold up at very large scales. Memory costs can limit scaling when there are data-structures with size linear in the number of processes, threading helps by keeping the number of processes manageable. Detailed performance analysis is viable at > 10^6 processes but requires care. On-the-fly performance data reduction has merits.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"77 1","pages":"2247-2280"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.Companion.2012.358","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

This article consists of a collection of slides from the author's conference presentation. The author concludes that The Blue Gene/Q design, low-power simple cores, four hardware threads per core, resu lts in high instruction throughput, and thus exceptional power efficiency for applications. Can effectively fill in pipeline stalls and hide latencies in the memory subsystem. The consequence is low performance per thread, so a high degree of parallelization is required for high application performance. Traditional programming methods (MPI, OpenMP, Pthreads) hold up at very large scales. Memory costs can limit scaling when there are data-structures with size linear in the number of processes, threading helps by keeping the number of processes manageable. Detailed performance analysis is viable at > 10^6 processes but requires care. On-the-fly performance data reduction has merits.
Blue Gene/Q应用性能表征与分析
本文由作者在会议上的演讲幻灯片组成。作者得出结论:Blue Gene/Q设计,低功耗的简单内核,每核四个硬件线程,导致高指令吞吐量,从而为应用程序提供卓越的功耗效率。可以有效地填补管道的停顿和隐藏内存子系统的延迟。其结果是每个线程的性能较低,因此需要高度的并行化来获得较高的应用程序性能。传统的编程方法(MPI、OpenMP、Pthreads)适用于非常大的规模。当数据结构的大小与进程数量呈线性关系时,内存成本可能会限制扩展,线程可以帮助保持进程数量的可管理性。详细的性能分析在bbb10 ^6进程中是可行的,但需要注意。动态性能数据缩减有其优点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信