Scheduling a 100,000 Core Supercomputer for Maximum Utilization and Capability

P. Andrews, P. Kovatch, Victor Hazlewood, Troy Baer
{"title":"Scheduling a 100,000 Core Supercomputer for Maximum Utilization and Capability","authors":"P. Andrews, P. Kovatch, Victor Hazlewood, Troy Baer","doi":"10.1109/ICPPW.2010.63","DOIUrl":null,"url":null,"abstract":"In late 2009, the National Institute for Computational Sciences placed in production the world’s fastest academic supercomputer (third overall), a Cray XT5 named Kraken, with almost 100,000 compute cores and a peak speed in excess of one Petaflop. Delivering over 50% of the total cycles available to the National Science Foundation users via the TeraGrid, Kraken has two missions that have historically proven difficult to simultaneously reconcile: providing the maximum number of total cycles to the community, while enabling full machine runs for “hero” users. Historically, this has been attempted by allowing schedulers to choose the correct time for the beginning of large jobs, with a concomitant reduction in utilization. At NICS, we used the results of a previous theoretical investigation to adopt a different approach, where the “clearing out” of the system is forced on a weekly basis, followed by consecutive full machine runs. As our previous simulation results suggested, this lead to a significant improvement in utilization, to over 90%. The difference in utilization between the traditional and adopted scheduling policies was the equivalent of a 300+ Teraflop supercomputer, or several million dollars of compute time per year.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 39th International Conference on Parallel Processing Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPPW.2010.63","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

In late 2009, the National Institute for Computational Sciences placed in production the world’s fastest academic supercomputer (third overall), a Cray XT5 named Kraken, with almost 100,000 compute cores and a peak speed in excess of one Petaflop. Delivering over 50% of the total cycles available to the National Science Foundation users via the TeraGrid, Kraken has two missions that have historically proven difficult to simultaneously reconcile: providing the maximum number of total cycles to the community, while enabling full machine runs for “hero” users. Historically, this has been attempted by allowing schedulers to choose the correct time for the beginning of large jobs, with a concomitant reduction in utilization. At NICS, we used the results of a previous theoretical investigation to adopt a different approach, where the “clearing out” of the system is forced on a weekly basis, followed by consecutive full machine runs. As our previous simulation results suggested, this lead to a significant improvement in utilization, to over 90%. The difference in utilization between the traditional and adopted scheduling policies was the equivalent of a 300+ Teraflop supercomputer, or several million dollars of compute time per year.
调度100,000核超级计算机以获得最大的利用率和能力
2009年底,美国国家计算科学研究所(National Institute for Computational Sciences)投入生产了世界上最快的学术超级计算机(排名第三),名为Kraken的克雷XT5,拥有近10万个计算核心,峰值速度超过1千万亿次。Kraken通过TeraGrid向美国国家科学基金会用户提供超过50%的可用总周期,这两项任务在历史上被证明是难以同时协调的:为社区提供最大数量的总周期,同时为“英雄”用户提供完整的机器运行。从历史上看,通过允许调度器为大型作业的开始选择正确的时间来尝试这一点,同时降低了利用率。在NICS,我们使用先前理论研究的结果来采用一种不同的方法,即每周强制对系统进行“清理”,然后连续运行全机。正如我们之前的模拟结果所表明的那样,这将导致利用率的显著提高,达到90%以上。传统调度策略和采用调度策略之间的利用率差异相当于一台300+ Teraflop的超级计算机,或者每年数百万美元的计算时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信