Efficient Runtime Environment for Coupled Multi-physics Simulations: Dynamic Resource Allocation and Load-Balancing

S. Ko, Nayong Kim, Joohyun Kim, A. Thota, S. Jha
{"title":"Efficient Runtime Environment for Coupled Multi-physics Simulations: Dynamic Resource Allocation and Load-Balancing","authors":"S. Ko, Nayong Kim, Joohyun Kim, A. Thota, S. Jha","doi":"10.1109/CCGRID.2010.107","DOIUrl":null,"url":null,"abstract":"Coupled Multi-Physics simulations, such as hybrid CFD-MD simulations, represent an increasingly important class of scientific applications. Often the physical problems of interest demand the use of high-end computers, such as TeraGrid resources, which are often accessible only via batch-queues. Batch-queue systems are not developed to natively support the coordinated scheduling of jobs – which in turn is required to support the concurrent execution required by coupled multi-physics simulations. In this paper we develop and demonstrate a novel approach to overcome the lack of native support for coordinated job submission requirement associated with coupled runs. We establish the performance advantages arising from our solution, which is a generalization of the Pilot-Job concept – which in of itself is not new, but is being applied to coupled simulations for the first time. Our solution not only overcomes the initial co-scheduling problem, but also provides a dynamic resource allocation mechanism. Support for such dynamic resources is critical for a load balancing mechanism, which we develop and demonstrate to be effective at reducing the total time-to-solution of the problem. We establish that the performance advantage of using Big Jobs is invariant with the size of the machine as well as the size of the physical model under investigation. The Pilot-Job abstraction is developed using SAGA, which provides an infrastructure agnostic implementation, and which can seamlessly execute and utilize distributed resources.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2010.107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 33

Abstract

Coupled Multi-Physics simulations, such as hybrid CFD-MD simulations, represent an increasingly important class of scientific applications. Often the physical problems of interest demand the use of high-end computers, such as TeraGrid resources, which are often accessible only via batch-queues. Batch-queue systems are not developed to natively support the coordinated scheduling of jobs – which in turn is required to support the concurrent execution required by coupled multi-physics simulations. In this paper we develop and demonstrate a novel approach to overcome the lack of native support for coordinated job submission requirement associated with coupled runs. We establish the performance advantages arising from our solution, which is a generalization of the Pilot-Job concept – which in of itself is not new, but is being applied to coupled simulations for the first time. Our solution not only overcomes the initial co-scheduling problem, but also provides a dynamic resource allocation mechanism. Support for such dynamic resources is critical for a load balancing mechanism, which we develop and demonstrate to be effective at reducing the total time-to-solution of the problem. We establish that the performance advantage of using Big Jobs is invariant with the size of the machine as well as the size of the physical model under investigation. The Pilot-Job abstraction is developed using SAGA, which provides an infrastructure agnostic implementation, and which can seamlessly execute and utilize distributed resources.
耦合多物理场仿真的高效运行环境:动态资源分配和负载平衡
耦合多物理场模拟,如混合CFD-MD模拟,代表了越来越重要的科学应用类别。通常,感兴趣的物理问题需要使用高端计算机,例如TeraGrid资源,这些资源通常只能通过批处理队列访问。批处理队列系统本身并不是为了支持作业的协调调度而开发的——这反过来又需要支持耦合多物理场模拟所需的并发执行。在本文中,我们开发并演示了一种新的方法来克服缺乏与耦合运行相关的协调作业提交需求的本机支持。我们建立了从我们的解决方案中产生的性能优势,该解决方案是Pilot-Job概念的概括- Pilot-Job本身并不新鲜,但首次应用于耦合模拟。我们的解决方案不仅克服了最初的协同调度问题,而且提供了一种动态的资源分配机制。对这种动态资源的支持对于负载平衡机制至关重要,我们开发并演示了负载平衡机制在减少解决问题的总时间方面是有效的。我们确定,使用Big Jobs的性能优势与机器的大小以及所研究的物理模型的大小是不变的。Pilot-Job抽象是使用SAGA开发的,它提供了一个与基础设施无关的实现,并且可以无缝地执行和利用分布式资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信