Md. Zesun Ahmed Mia, Md. Moinul Islam, Monjurul Haque, S. Islam, Sajidur Rahman
{"title":"IRFD: A Feature Engineering based Ensemble Classification for Detecting Electricity Fraud in Traditional Meters","authors":"Md. Zesun Ahmed Mia, Md. Moinul Islam, Monjurul Haque, S. Islam, Sajidur Rahman","doi":"10.1109/ICCIT54785.2021.9689842","DOIUrl":null,"url":null,"abstract":"Nations have suffered significant economic losses as a result of non-technical electric losses resulting from power fraud. It is a criminal act of stealing electricity by applying various mechanisms that incorporate unauthorized tapping to the power line, bypassing the smart meter, etc. Electricity theft is a significant concern for not only developing countries but also developed countries as well. However, for most developing countries, the implications are catastrophic, given that their usage is always less than their demands. Electricity theft must be detected precisely and quickly in order to be mitigated. In our study, we have proposed a method of predictive ensemble machine learning techniques (IRFD) with a novel combination of feature distinction methods to detect electricity theft. In our proposed model, we have combined feature selection technique, Recursive Feature Elimination with Stratified 10-Fold cross-validation (RFECV) and Isolation Forest (IF), to identify and remove outliers along with several machine learning classifiers to forecast the theft of electricity. This study additionally enhances the management of highly imbalanced fraudulent data with Borderline-SMOTE with SVM (SVMSMOTE) and feature scaling with StandardScaler. Following the study, the Random Forest classifier observed a higher degree of accuracy (97.06%) with higher precision, recall, and F1-Score. To evaluate the efficacy of our proposed model, comparative analysis of the classification metrics is also assessed with several machine learning classifiers like Logistic Regression, Gradient Boosting, XGBoost, AdaBoost, KNN, ANN, along with Random Forest before and after fitting our proposed feature engineering techniques.","PeriodicalId":166450,"journal":{"name":"2021 24th International Conference on Computer and Information Technology (ICCIT)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 24th International Conference on Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIT54785.2021.9689842","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Nations have suffered significant economic losses as a result of non-technical electric losses resulting from power fraud. It is a criminal act of stealing electricity by applying various mechanisms that incorporate unauthorized tapping to the power line, bypassing the smart meter, etc. Electricity theft is a significant concern for not only developing countries but also developed countries as well. However, for most developing countries, the implications are catastrophic, given that their usage is always less than their demands. Electricity theft must be detected precisely and quickly in order to be mitigated. In our study, we have proposed a method of predictive ensemble machine learning techniques (IRFD) with a novel combination of feature distinction methods to detect electricity theft. In our proposed model, we have combined feature selection technique, Recursive Feature Elimination with Stratified 10-Fold cross-validation (RFECV) and Isolation Forest (IF), to identify and remove outliers along with several machine learning classifiers to forecast the theft of electricity. This study additionally enhances the management of highly imbalanced fraudulent data with Borderline-SMOTE with SVM (SVMSMOTE) and feature scaling with StandardScaler. Following the study, the Random Forest classifier observed a higher degree of accuracy (97.06%) with higher precision, recall, and F1-Score. To evaluate the efficacy of our proposed model, comparative analysis of the classification metrics is also assessed with several machine learning classifiers like Logistic Regression, Gradient Boosting, XGBoost, AdaBoost, KNN, ANN, along with Random Forest before and after fitting our proposed feature engineering techniques.