Reducing Fragmentation on Torus-Connected Supercomputers

Wei Tang, Z. Lan, N. Desai, Daniel Buettner, Yongen Yu
{"title":"Reducing Fragmentation on Torus-Connected Supercomputers","authors":"Wei Tang, Z. Lan, N. Desai, Daniel Buettner, Yongen Yu","doi":"10.1109/IPDPS.2011.82","DOIUrl":null,"url":null,"abstract":"Torus-based networks are prevalent on leadership-class petascale systems, providing a good balance between network cost and performance. The major disadvantage of this network architecture is its susceptibility to fragmentation. Many studies have attempted to reduce resource fragmentation in this architecture. Although the approaches suggested can make good allocation decisions reducing fragmentation at job start time, none of them considers a job's wall time, which can cause resource fragmentation when neighboring jobs do not complete closely. In this paper, we propose a wall time-aware job allocation strategy, which adjacently packs jobs that finish around the same time, in order to minimize resource fragmentation caused by job length, discrepancy. Event-driven simulations using real job traces from a production Blue Gene/P system at Argonne National Laboratory demonstrate that our wall time-aware strategy can effectively reduce system fragmentation and improve overall system performance.","PeriodicalId":355100,"journal":{"name":"2011 IEEE International Parallel & Distributed Processing Symposium","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Parallel & Distributed Processing Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2011.82","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35

Abstract

Torus-based networks are prevalent on leadership-class petascale systems, providing a good balance between network cost and performance. The major disadvantage of this network architecture is its susceptibility to fragmentation. Many studies have attempted to reduce resource fragmentation in this architecture. Although the approaches suggested can make good allocation decisions reducing fragmentation at job start time, none of them considers a job's wall time, which can cause resource fragmentation when neighboring jobs do not complete closely. In this paper, we propose a wall time-aware job allocation strategy, which adjacently packs jobs that finish around the same time, in order to minimize resource fragmentation caused by job length, discrepancy. Event-driven simulations using real job traces from a production Blue Gene/P system at Argonne National Laboratory demonstrate that our wall time-aware strategy can effectively reduce system fragmentation and improve overall system performance.
减少环形连接超级计算机上的碎片
基于环的网络在领导级千兆级系统上很普遍,在网络成本和性能之间提供了很好的平衡。这种网络架构的主要缺点是易受碎片的影响。许多研究都试图减少这种架构中的资源碎片化。尽管建议的方法可以做出良好的分配决策,减少作业开始时的碎片,但它们都没有考虑作业的隔离时间,这可能会在邻近作业没有紧密完成时导致资源碎片。在本文中,我们提出了一种墙时间感知的作业分配策略,该策略将在同一时间完成的作业相邻打包,以最大限度地减少作业长度差异造成的资源碎片。来自Argonne国家实验室的Blue Gene/P生产系统的真实作业轨迹的事件驱动模拟表明,我们的壁时间感知策略可以有效地减少系统碎片并提高整体系统性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信