TDP-Shell: A Generic Framework to Improve Interoperability between Batch Queue Systems and Monitoring Tools

Vicente Ivars, M. A. Senar, E. Heymann
{"title":"TDP-Shell: A Generic Framework to Improve Interoperability between Batch Queue Systems and Monitoring Tools","authors":"Vicente Ivars, M. A. Senar, E. Heymann","doi":"10.1109/CLUSTER.2011.73","DOIUrl":null,"url":null,"abstract":"Nowadays distributed applications, including MPI implementations, are executed on computer clusters managed by a batch queue system. Users take advantage of monitoring tools to detect run-time problems on their applications running on those environments. But it is a challenge to use monitoring tools on a cluster controlled by a batch queue system. This is due to the fact that batch queue systems and monitoring tools do not coordinate the management of the resources they share, when executing a distributed application. We name this problem lack of interoperability and to solve it we have developed a framework called TDP-Shell. This framework supports different batch queue systems such as Condor and SGE, and different monitoring tools such as Paradyn, Gdb and Total view, without any changes on their source code. In this paper we describe how our basic design of TDP-Shell for sequential applications was re-designed to support the monitoring of MPI applications that are executed on a cluster controlled by a batch queue system.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"199 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2011.73","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Nowadays distributed applications, including MPI implementations, are executed on computer clusters managed by a batch queue system. Users take advantage of monitoring tools to detect run-time problems on their applications running on those environments. But it is a challenge to use monitoring tools on a cluster controlled by a batch queue system. This is due to the fact that batch queue systems and monitoring tools do not coordinate the management of the resources they share, when executing a distributed application. We name this problem lack of interoperability and to solve it we have developed a framework called TDP-Shell. This framework supports different batch queue systems such as Condor and SGE, and different monitoring tools such as Paradyn, Gdb and Total view, without any changes on their source code. In this paper we describe how our basic design of TDP-Shell for sequential applications was re-designed to support the monitoring of MPI applications that are executed on a cluster controlled by a batch queue system.
TDP-Shell:改进批处理队列系统和监控工具之间互操作性的通用框架
目前,包括MPI实现在内的分布式应用程序都在由批处理队列系统管理的计算机集群上执行。用户利用监视工具来检测在这些环境中运行的应用程序的运行时问题。但是在批处理队列系统控制的集群上使用监视工具是一个挑战。这是因为在执行分布式应用程序时,批处理队列系统和监视工具不能协调对它们共享的资源的管理。我们将这个问题命名为缺乏互操作性,为了解决这个问题,我们开发了一个名为TDP-Shell的框架。该框架支持不同的批处理队列系统(如Condor和SGE),以及不同的监控工具(如Paradyn、Gdb和Total view),而无需对其源代码进行任何更改。在本文中,我们描述了如何重新设计用于顺序应用程序的TDP-Shell的基本设计,以支持监视在批处理队列系统控制的集群上执行的MPI应用程序。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信