{"title":"Dataset Outlier Detection Method Based on Random Forest Algorithm","authors":"Ying-gang Zheng","doi":"10.1109/AIAM57466.2022.00111","DOIUrl":null,"url":null,"abstract":"Outlier detection plays a very important role in real life, and requires long-term and continuous study and research in this field. The purpose of this paper is to study outlier detection methods for datasets based on the random forest algorithm. This paper briefly describes the research background and significance of the field of outlier detection, the research status at home and abroad, the application of outlier detection in various real-world scenarios, and some research problems that need to be solved urgently. The concept of outliers is summarized, and random forests and locality-sensitive hashing algorithms are briefly introduced. The RHSForest algorithm is proposed, the idea and process of the algorithm are discussed in detail, and the parameter settings and evaluation indicators are discussed in detail. Then the RHSForest algorithm is verified and evaluated by experiments. The experimental results are then analyzed, and the experimental results on 5 benchmark datasets show that the RHSForest algorithm has an AUC value of up to 95% in the Glass dataset, providing consistent performance improvements for the detection of outliers.","PeriodicalId":439903,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 4th International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIAM57466.2022.00111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Outlier detection plays a very important role in real life, and requires long-term and continuous study and research in this field. The purpose of this paper is to study outlier detection methods for datasets based on the random forest algorithm. This paper briefly describes the research background and significance of the field of outlier detection, the research status at home and abroad, the application of outlier detection in various real-world scenarios, and some research problems that need to be solved urgently. The concept of outliers is summarized, and random forests and locality-sensitive hashing algorithms are briefly introduced. The RHSForest algorithm is proposed, the idea and process of the algorithm are discussed in detail, and the parameter settings and evaluation indicators are discussed in detail. Then the RHSForest algorithm is verified and evaluated by experiments. The experimental results are then analyzed, and the experimental results on 5 benchmark datasets show that the RHSForest algorithm has an AUC value of up to 95% in the Glass dataset, providing consistent performance improvements for the detection of outliers.