System support for many task computing

E. V. Hensbergen, R. Minnich
{"title":"System support for many task computing","authors":"E. V. Hensbergen, R. Minnich","doi":"10.1109/MTAGS.2008.4777907","DOIUrl":null,"url":null,"abstract":"The popularity of large scale systems such as Blue Gene has extended their reach beyond HPC into the realm of commercial computing. There is a desire in both communities to broaden the scope of these machines from tightly-coupled scientific applications running on MPI frameworks to more general-purpose workloads. Our approach deals with issues of scale by leveraging the huge number of nodes to distribute operating systems services and components across the machine, tightly coupling the operating system and the interconnects to take maximum advantage of the unique capabilities of the HPC system. We plan on provisioning nodes to provide workload execution, aggregation, and system services, and dynamically re-provisioning nodes as necessary to accommodate changes, failure, and redundancy. By incorporating aggregation as a first-class system construct, we will provide dynamic hierarchical organization and management of all system resources. In this paper, we will go into the design principles of our approach using file systems, workload distribution and system monitoring as illustrative examples. Our end goal is to provide a cohesive distributed system which can broaden the class of applications for large scale systems and also make them more approachable for a larger class of developers and end users.","PeriodicalId":278412,"journal":{"name":"2008 Workshop on Many-Task Computing on Grids and Supercomputers","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Workshop on Many-Task Computing on Grids and Supercomputers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MTAGS.2008.4777907","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

The popularity of large scale systems such as Blue Gene has extended their reach beyond HPC into the realm of commercial computing. There is a desire in both communities to broaden the scope of these machines from tightly-coupled scientific applications running on MPI frameworks to more general-purpose workloads. Our approach deals with issues of scale by leveraging the huge number of nodes to distribute operating systems services and components across the machine, tightly coupling the operating system and the interconnects to take maximum advantage of the unique capabilities of the HPC system. We plan on provisioning nodes to provide workload execution, aggregation, and system services, and dynamically re-provisioning nodes as necessary to accommodate changes, failure, and redundancy. By incorporating aggregation as a first-class system construct, we will provide dynamic hierarchical organization and management of all system resources. In this paper, we will go into the design principles of our approach using file systems, workload distribution and system monitoring as illustrative examples. Our end goal is to provide a cohesive distributed system which can broaden the class of applications for large scale systems and also make them more approachable for a larger class of developers and end users.
系统支持许多任务计算
像Blue Gene这样的大规模系统的流行已经将它们的范围从高性能计算扩展到了商业计算领域。两个社区都希望将这些机器的范围从在MPI框架上运行的紧密耦合的科学应用程序扩展到更通用的工作负载。我们的方法通过利用大量节点在机器上分布操作系统服务和组件来处理规模问题,将操作系统和互连紧密耦合,以最大限度地利用HPC系统的独特功能。我们计划配置节点来提供工作负载执行、聚合和系统服务,并根据需要动态地重新配置节点,以适应更改、故障和冗余。通过将聚合作为一级系统结构,我们将提供所有系统资源的动态分层组织和管理。在本文中,我们将使用文件系统、工作负载分布和系统监视作为说明性示例,深入探讨我们的方法的设计原则。我们的最终目标是提供一个内聚的分布式系统,它可以扩展大规模系统的应用程序类别,并使它们更容易为更大类别的开发人员和最终用户所接受。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信