{"title":"Causality inference for failures in NFV","authors":"D. Kushnir, M. Goldstein","doi":"10.1109/INFCOMW.2016.7562212","DOIUrl":null,"url":null,"abstract":"In this paper we consider a root-cause analysis framework for NFV infrastructure. As monitoring machinery for NFV has matured the next step is to leverage on such data to automatically optimize failure detection, analysis, and overall resiliency. The complex architecture and dynamics of NFV poses significant challenges from the point of view of causality inference. In particular, the need for an approach that does not depend on domain knowledge or human intervention is of high importance. We propose in this context a step-wise data-driven root-case analysis approach based on correlation clustering, and time sensitivity analysis of alarms data. Our approach recovers templates of causality relationship between network resources alarms, which in turn allows to determine rules for performing root cause analysis. We demonstrate our approach on real data collected from NFV, where our algorithm computes causality templates. These templates were verified by system experts, while most of them were confirmed to be known and others were new.","PeriodicalId":348177,"journal":{"name":"2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFCOMW.2016.7562212","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
In this paper we consider a root-cause analysis framework for NFV infrastructure. As monitoring machinery for NFV has matured the next step is to leverage on such data to automatically optimize failure detection, analysis, and overall resiliency. The complex architecture and dynamics of NFV poses significant challenges from the point of view of causality inference. In particular, the need for an approach that does not depend on domain knowledge or human intervention is of high importance. We propose in this context a step-wise data-driven root-case analysis approach based on correlation clustering, and time sensitivity analysis of alarms data. Our approach recovers templates of causality relationship between network resources alarms, which in turn allows to determine rules for performing root cause analysis. We demonstrate our approach on real data collected from NFV, where our algorithm computes causality templates. These templates were verified by system experts, while most of them were confirmed to be known and others were new.