Abdur Chowdhury, Lisa D. Nicklas, Sanjeev Setia, E. White
{"title":"Supporting dynamic space-sharing on clusters of non-dedicated workstations","authors":"Abdur Chowdhury, Lisa D. Nicklas, Sanjeev Setia, E. White","doi":"10.1109/ICDCS.1997.597902","DOIUrl":null,"url":null,"abstract":"Clusters of workstations are increasingly being viewed as a cost effective alternative to parallel supercomputers. However, resource management and scheduling on workstations clusters is complicated by the fact that the number of idle workstations available for executing parallel applications is constantly fluctuating. We present a case for scheduling parallel applications on non dedicated workstation clusters using dynamic space sharing, a policy under which the number of processors allocated to an application can be changed during its execution. We describe an approach that uses application level checkpointing and data repartitioning for supporting dynamic space sharing and for handling the dynamic reconfiguration triggered when failure or owner activity is detected on a workstation being used by a parallel application. The performance advantages of dynamic space sharing are quantified through a simulation study, and experimental results are presented for the overhead of dynamic reconfiguration of a grid oriented data parallel application using our approach.","PeriodicalId":122990,"journal":{"name":"Proceedings of 17th International Conference on Distributed Computing Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 17th International Conference on Distributed Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.1997.597902","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18
Abstract
Clusters of workstations are increasingly being viewed as a cost effective alternative to parallel supercomputers. However, resource management and scheduling on workstations clusters is complicated by the fact that the number of idle workstations available for executing parallel applications is constantly fluctuating. We present a case for scheduling parallel applications on non dedicated workstation clusters using dynamic space sharing, a policy under which the number of processors allocated to an application can be changed during its execution. We describe an approach that uses application level checkpointing and data repartitioning for supporting dynamic space sharing and for handling the dynamic reconfiguration triggered when failure or owner activity is detected on a workstation being used by a parallel application. The performance advantages of dynamic space sharing are quantified through a simulation study, and experimental results are presented for the overhead of dynamic reconfiguration of a grid oriented data parallel application using our approach.