S. E. Hajjami, Jamal Malki, M. Berrada, Bouziane Fourka
{"title":"Machine Learning for anomaly detection. Performance study considering anomaly distribution in an imbalanced dataset","authors":"S. E. Hajjami, Jamal Malki, M. Berrada, Bouziane Fourka","doi":"10.1109/CloudTech49835.2020.9365887","DOIUrl":null,"url":null,"abstract":"The continuous dematerialization of real-world data greatly contributes to the important growing of the exchanged data. In this case, anomaly detection is increasingly becoming an important task of data analysis in order to detect abnormal data, which is of particular interest and may require action. Recent advances in artificial intelligence approaches, such as machine learning, are making an important breakthrough in this area. Typically, these techniques have been designed for balanced data sets or that have certain assumptions about the distribution of data. However, the real applications are rather confronted with an imbalanced data distribution, where normal data are present in large quantities and abnormal cases are generally very few. This makes anomaly detection similar to looking for the needle in a haystack. In this article, we develop an experimental setup for comparative analysis of two types of machine learning techniques in their application to anomaly detection systems. We study their performance taking into account anomaly distribution in an imbalanced dataset.","PeriodicalId":272860,"journal":{"name":"2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudTech49835.2020.9365887","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
The continuous dematerialization of real-world data greatly contributes to the important growing of the exchanged data. In this case, anomaly detection is increasingly becoming an important task of data analysis in order to detect abnormal data, which is of particular interest and may require action. Recent advances in artificial intelligence approaches, such as machine learning, are making an important breakthrough in this area. Typically, these techniques have been designed for balanced data sets or that have certain assumptions about the distribution of data. However, the real applications are rather confronted with an imbalanced data distribution, where normal data are present in large quantities and abnormal cases are generally very few. This makes anomaly detection similar to looking for the needle in a haystack. In this article, we develop an experimental setup for comparative analysis of two types of machine learning techniques in their application to anomaly detection systems. We study their performance taking into account anomaly distribution in an imbalanced dataset.