{"title":"MPICH-G-DM: An Enhanced MPICH-G with Supporting Dynamic Job Migration","authors":"Xiaohui Wei, Hongliang Li, Dexiong Li","doi":"10.1109/ChinaGrid.2009.9","DOIUrl":null,"url":null,"abstract":"Grid is attracting more and more attentions by its massive computational capacity. Tools like Globus Toolkit and MPICH-G2 have been developed to help scientists to facilitate their researches. As a Grid-enabled implementation of MPI, MPICH-G2 helps developers to port parallel applications to cross-domain environment. Since the current computationally-intensive parallel applications, especially long-running tasks, require high availability as well as high performance computing platform, dynamic job migration in Grid environment has became an essential issue. In this study, we present a dynamic job migration enabled MPICH-G2 version, MPICH-G-DM. We use Virtual Job Model (VJM) to reserve resources for the migrating jobs in advance to improve the efficiency of the system. An Asynchronous Migration Protocol (AMP) is proposed to enable the migrating sub jobs to checkpoint/restart and update their new addresses concurrently without a global synchronization. In order to reduce the communicating overhead of job migration, MPICH-G-DM minimized the number of control messages among domains to O(N). Experiment results show that MPICH-G-DM is effective and reliable.","PeriodicalId":212445,"journal":{"name":"2009 Fourth ChinaGrid Annual Conference","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Fourth ChinaGrid Annual Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ChinaGrid.2009.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Grid is attracting more and more attentions by its massive computational capacity. Tools like Globus Toolkit and MPICH-G2 have been developed to help scientists to facilitate their researches. As a Grid-enabled implementation of MPI, MPICH-G2 helps developers to port parallel applications to cross-domain environment. Since the current computationally-intensive parallel applications, especially long-running tasks, require high availability as well as high performance computing platform, dynamic job migration in Grid environment has became an essential issue. In this study, we present a dynamic job migration enabled MPICH-G2 version, MPICH-G-DM. We use Virtual Job Model (VJM) to reserve resources for the migrating jobs in advance to improve the efficiency of the system. An Asynchronous Migration Protocol (AMP) is proposed to enable the migrating sub jobs to checkpoint/restart and update their new addresses concurrently without a global synchronization. In order to reduce the communicating overhead of job migration, MPICH-G-DM minimized the number of control messages among domains to O(N). Experiment results show that MPICH-G-DM is effective and reliable.