{"title":"基于距离的无监督局部离群点检测:基于值分析的机器学习改进离群点检测","authors":"Atul Kumar Gupta, Rahul Kumar, Jhankar Moolchandani, Vikas Thada, Mohd Asif Shah, Anoop Kumar Tiwari","doi":"10.1049/cmu2.70060","DOIUrl":null,"url":null,"abstract":"<p>Machine learning faces challenges in detecting outliers, especially in high-dimensional datasets. Effective data quality is crucial for better results, and many algorithms identify outliers by analysing outlying aspects of data objects and objects within the dataset. The proposed Advanced Distance-Based Unsupervised Local Outlier Detection (DU-LOD) method improves this process by continuously evaluating and identifying outliers using unsupervised learning and distance-based calculations. DU-LOD identifies outliers by comparing differences between data objects and their neighbours, making it the first method to combine unsupervised local outlier detection with nearest cluster point identification. Experimental analysis through accuracy performance of 96.12%, detection rate performance of 41.89%, precision of 56.12%, and recall of 1.79% proves that our model performs best over the various parameters compared with other existing algorithms. Therefore, measures such as area under the ROC curve (AUC), precision and recall are more appropriate in such a scenario.</p>","PeriodicalId":55001,"journal":{"name":"IET Communications","volume":"19 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cmu2.70060","citationCount":"0","resultStr":"{\"title\":\"Distance-Based Unsupervised Local Outlier Detection: Based Values Analysis to Improve Outlier Detection Using Machine Learning\",\"authors\":\"Atul Kumar Gupta, Rahul Kumar, Jhankar Moolchandani, Vikas Thada, Mohd Asif Shah, Anoop Kumar Tiwari\",\"doi\":\"10.1049/cmu2.70060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Machine learning faces challenges in detecting outliers, especially in high-dimensional datasets. Effective data quality is crucial for better results, and many algorithms identify outliers by analysing outlying aspects of data objects and objects within the dataset. The proposed Advanced Distance-Based Unsupervised Local Outlier Detection (DU-LOD) method improves this process by continuously evaluating and identifying outliers using unsupervised learning and distance-based calculations. DU-LOD identifies outliers by comparing differences between data objects and their neighbours, making it the first method to combine unsupervised local outlier detection with nearest cluster point identification. Experimental analysis through accuracy performance of 96.12%, detection rate performance of 41.89%, precision of 56.12%, and recall of 1.79% proves that our model performs best over the various parameters compared with other existing algorithms. Therefore, measures such as area under the ROC curve (AUC), precision and recall are more appropriate in such a scenario.</p>\",\"PeriodicalId\":55001,\"journal\":{\"name\":\"IET Communications\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cmu2.70060\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Communications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/cmu2.70060\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Communications","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cmu2.70060","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
摘要
机器学习在检测异常值方面面临挑战,特别是在高维数据集中。有效的数据质量对于获得更好的结果至关重要,许多算法通过分析数据对象和数据集中对象的异常方面来识别异常值。提出的基于距离的高级无监督局部异常点检测(DU-LOD)方法通过使用无监督学习和基于距离的计算连续评估和识别异常点,改进了这一过程。DU-LOD通过比较数据对象与其邻居之间的差异来识别异常值,使其成为第一个将无监督局部异常点检测与最近聚类点识别相结合的方法。通过96.12%的准确率、41.89%的检出率、56.12%的准确率和1.79%的召回率等实验分析,证明了我们的模型在各参数上的性能都是现有算法中最好的。因此,ROC曲线下面积(area under ROC curve, AUC)、精确度(precision)和召回率(recall)等指标在这种情况下更为合适。
Distance-Based Unsupervised Local Outlier Detection: Based Values Analysis to Improve Outlier Detection Using Machine Learning
Machine learning faces challenges in detecting outliers, especially in high-dimensional datasets. Effective data quality is crucial for better results, and many algorithms identify outliers by analysing outlying aspects of data objects and objects within the dataset. The proposed Advanced Distance-Based Unsupervised Local Outlier Detection (DU-LOD) method improves this process by continuously evaluating and identifying outliers using unsupervised learning and distance-based calculations. DU-LOD identifies outliers by comparing differences between data objects and their neighbours, making it the first method to combine unsupervised local outlier detection with nearest cluster point identification. Experimental analysis through accuracy performance of 96.12%, detection rate performance of 41.89%, precision of 56.12%, and recall of 1.79% proves that our model performs best over the various parameters compared with other existing algorithms. Therefore, measures such as area under the ROC curve (AUC), precision and recall are more appropriate in such a scenario.
期刊介绍:
IET Communications covers the fundamental and generic research for a better understanding of communication technologies to harness the signals for better performing communication systems using various wired and/or wireless media. This Journal is particularly interested in research papers reporting novel solutions to the dominating problems of noise, interference, timing and errors for reduction systems deficiencies such as wasting scarce resources such as spectra, energy and bandwidth.
Topics include, but are not limited to:
Coding and Communication Theory;
Modulation and Signal Design;
Wired, Wireless and Optical Communication;
Communication System
Special Issues. Current Call for Papers:
Cognitive and AI-enabled Wireless and Mobile - https://digital-library.theiet.org/files/IET_COM_CFP_CAWM.pdf
UAV-Enabled Mobile Edge Computing - https://digital-library.theiet.org/files/IET_COM_CFP_UAV.pdf