{"title":"Subordination: Cluster management without distributed consensus","authors":"I. Gankevich, Y. Tipikin, V. Gaiduchok","doi":"10.1109/HPCSim.2015.7237106","DOIUrl":null,"url":null,"abstract":"Nowadays, many cluster management systems rely on distributed consensus algorithms to elect a leader that orchestrates subordinate nodes. Contrary to these studies we propose consensus-free algorithm that arranges cluster nodes into multiple levels of subordination. The algorithm structures IP address range of cluster network so that each node has ranked list of candidates, from which it chooses a leader. The results show that this approach easily scales to a large number of nodes due to its asynchronous nature, and enables fast recovery from node failures as they occur only on one level of hierarchy. Multiple levels of subordination are useful for efficiently collecting monitoring and accounting data from large number of nodes, and for scheduling general-purpose tasks on a cluster.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCSim.2015.7237106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Nowadays, many cluster management systems rely on distributed consensus algorithms to elect a leader that orchestrates subordinate nodes. Contrary to these studies we propose consensus-free algorithm that arranges cluster nodes into multiple levels of subordination. The algorithm structures IP address range of cluster network so that each node has ranked list of candidates, from which it chooses a leader. The results show that this approach easily scales to a large number of nodes due to its asynchronous nature, and enables fast recovery from node failures as they occur only on one level of hierarchy. Multiple levels of subordination are useful for efficiently collecting monitoring and accounting data from large number of nodes, and for scheduling general-purpose tasks on a cluster.