Sustained systems performance monitoring at the U.S. Department of Defense High Performance Computing Modernization Program

P. Bennett
{"title":"Sustained systems performance monitoring at the U.S. Department of Defense High Performance Computing Modernization Program","authors":"P. Bennett","doi":"10.1145/2063348.2063352","DOIUrl":null,"url":null,"abstract":"The U.S. Department of Defense High Performance Computing Modernization Program (HPCMP) has implemented sustained systems performance testing on high performance computing systems in use at DoD Supercomputing Resource Centers. The intent is to monitor performance improvements by updates to the operating system, compiler suites, and numerical and communications libraries, and to monitor penalties arising from security patches. In practice, each system's workload is simulated by appropriate choices of user application codes representative of the HPCMP computational technical areas. Past successes include surfacing an imminent failure of an OST in a Cray XT3, incomplete configuration of a scheduler update on an SGI Altix 4700, performance issues associated with a communications library update for a Linux Networx Advanced Technology Cluster, and intermittent resetting of Intel Nehalem cores to standard mode from turbo mode. This history demonstrates that SSP testing is critical to deliver the highest quality of service to the HPCMP users.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2063348.2063352","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

The U.S. Department of Defense High Performance Computing Modernization Program (HPCMP) has implemented sustained systems performance testing on high performance computing systems in use at DoD Supercomputing Resource Centers. The intent is to monitor performance improvements by updates to the operating system, compiler suites, and numerical and communications libraries, and to monitor penalties arising from security patches. In practice, each system's workload is simulated by appropriate choices of user application codes representative of the HPCMP computational technical areas. Past successes include surfacing an imminent failure of an OST in a Cray XT3, incomplete configuration of a scheduler update on an SGI Altix 4700, performance issues associated with a communications library update for a Linux Networx Advanced Technology Cluster, and intermittent resetting of Intel Nehalem cores to standard mode from turbo mode. This history demonstrates that SSP testing is critical to deliver the highest quality of service to the HPCMP users.
美国国防部高性能计算现代化项目的持续系统性能监测
美国国防部高性能计算现代化计划(HPCMP)已经在国防部超级计算资源中心使用的高性能计算系统上实施了持续的系统性能测试。其目的是通过更新操作系统、编译器套件、数字和通信库来监视性能改进,并监视安全补丁产生的惩罚。在实践中,通过适当选择代表HPCMP计算技术领域的用户应用程序代码来模拟每个系统的工作负载。过去的成功案例包括在Cray XT3中出现即将发生的OST故障,在SGI Altix 4700上配置不完整的调度程序更新,与Linux Networx Advanced Technology Cluster的通信库更新相关的性能问题,以及间歇性地将Intel Nehalem内核从turbo模式重置为标准模式。这一历史证明,SSP测试对于向HPCMP用户提供最高质量的服务至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信