{"title":"An architecture for integrated resource management of MPI jobs","authors":"S. Sistare, Jack A. Test, D. Plauger","doi":"10.1109/CLUSTR.2002.1137769","DOIUrl":null,"url":null,"abstract":"We present a new architecture for the integration of distributed resource management systems and parallel run-time environments such as MPI. The architecture solves the long-standing problem of achieving a tight integration between the two in a clean and robust manner that fully enables the functionality of both systems, including resource limit enforcement and accounting. We also present a more uniform command interface to the user, which simplifies the task of running parallel jobs and tools under a resource manager. The architecture is extensible and allows new systems to be incorporated. We describe the properties that a resource management system must have to work in this architecture, and find that these are ubiquitous in the resource management world. Using the Sun/spl trade/ Cluster Runtime Environment, we show the generality of the approach by implementing tight integrations with PBS, LSF and Sun Grid Engine software, and we demonstrate the advantages of a tight integration. No modifications or enhancements to these resource management systems were required, which is in marked contrast to ad-hoc approaches which typically require such changes.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"33 1","pages":"370-377"},"PeriodicalIF":0.0000,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTR.2002.1137769","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
We present a new architecture for the integration of distributed resource management systems and parallel run-time environments such as MPI. The architecture solves the long-standing problem of achieving a tight integration between the two in a clean and robust manner that fully enables the functionality of both systems, including resource limit enforcement and accounting. We also present a more uniform command interface to the user, which simplifies the task of running parallel jobs and tools under a resource manager. The architecture is extensible and allows new systems to be incorporated. We describe the properties that a resource management system must have to work in this architecture, and find that these are ubiquitous in the resource management world. Using the Sun/spl trade/ Cluster Runtime Environment, we show the generality of the approach by implementing tight integrations with PBS, LSF and Sun Grid Engine software, and we demonstrate the advantages of a tight integration. No modifications or enhancements to these resource management systems were required, which is in marked contrast to ad-hoc approaches which typically require such changes.