{"title":"CloudMiner: A Systematic Failure Diagnosis Framework in Enterprise Cloud Environments","authors":"Ibrahim El-Shekeil, Amitangshu Pal, K. Kant","doi":"10.1109/CloudCom2018.2018.00028","DOIUrl":null,"url":null,"abstract":"Applications and network services in enterprise cloud environments have direct and indirect dependencies. The configuration of these services varies based on business needs. However, accurate and complete documentation of the configuration may not exist at all times. Thus, failure diagnosis becomes further complex with such unknown/uncertain dependencies. To cope with this, some probing stations need to be installed in suitable locations in the network to provide full monitoring and diagnosing capability. In this paper we develop a novel CloudMiner architecture for failure diagnosis in enterprise clouds that consist of developing intelligent probing station selection, failure detection and diagnosis across the network components using the minimum set of network probes, considering the inter-dependencies across the network services/components. Extensive simulation results show that CloudMiner can always identify the faulty components among the list of a small set of suspected components, the size of which is as low as ~3 for a network with 460 components.","PeriodicalId":365939,"journal":{"name":"2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudCom2018.2018.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Applications and network services in enterprise cloud environments have direct and indirect dependencies. The configuration of these services varies based on business needs. However, accurate and complete documentation of the configuration may not exist at all times. Thus, failure diagnosis becomes further complex with such unknown/uncertain dependencies. To cope with this, some probing stations need to be installed in suitable locations in the network to provide full monitoring and diagnosing capability. In this paper we develop a novel CloudMiner architecture for failure diagnosis in enterprise clouds that consist of developing intelligent probing station selection, failure detection and diagnosis across the network components using the minimum set of network probes, considering the inter-dependencies across the network services/components. Extensive simulation results show that CloudMiner can always identify the faulty components among the list of a small set of suspected components, the size of which is as low as ~3 for a network with 460 components.