{"title":"Automating anomaly detection for exploratory data analytics","authors":"Karun Thankachan","doi":"10.1109/ICICI.2017.8365228","DOIUrl":null,"url":null,"abstract":"This paper discusses a design to automate the process of exploratory data analysis with an emphasis on outlier and anomaly detection. The paper discusses the domain of exploratory data analysis, the complexity involved in automating it and a solution leveraging the latest advances in computing to meet this. The solution details a framework that can accept data, understand the structure and type of variables, extract important variables and detect outliers or anomalies for understanding process bottlenecks. It takes advantage of big-data technologies and distributed computing (Hadoop and Spark) to make feasible the task of carrying out multiple lines of analysis and using intermediate results to drive analysis towards the desired goal. Statistical methods and visual data analytics form the core of the framework helping to automate exploratory data analysis, reducing time and focusing on the most valuable areas of concern in the data.","PeriodicalId":369524,"journal":{"name":"2017 International Conference on Inventive Computing and Informatics (ICICI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Inventive Computing and Informatics (ICICI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICI.2017.8365228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
This paper discusses a design to automate the process of exploratory data analysis with an emphasis on outlier and anomaly detection. The paper discusses the domain of exploratory data analysis, the complexity involved in automating it and a solution leveraging the latest advances in computing to meet this. The solution details a framework that can accept data, understand the structure and type of variables, extract important variables and detect outliers or anomalies for understanding process bottlenecks. It takes advantage of big-data technologies and distributed computing (Hadoop and Spark) to make feasible the task of carrying out multiple lines of analysis and using intermediate results to drive analysis towards the desired goal. Statistical methods and visual data analytics form the core of the framework helping to automate exploratory data analysis, reducing time and focusing on the most valuable areas of concern in the data.