IFDRF：利用混合机器学习模型推进异常检测

IF 0.8 Q4 OPTICS

Optical Memory and Neural Networks Pub Date : 2025-02-03 DOI:10.3103/S1060992X24700474

Hariharan Ramesh, Faridoddin Shariaty, Sanjiban Sekhar Roy

{"title":"IFDRF：利用混合机器学习模型推进异常检测","authors":"Hariharan Ramesh, Faridoddin Shariaty, Sanjiban Sekhar Roy","doi":"10.3103/S1060992X24700474","DOIUrl":null,"url":null,"abstract":"Anomaly detection is the identification of aberrations in the dataset using statistical methods or machine learning algorithms. It is widely performed using unsupervised learning algorithms because labelling the data manually can be expensive. While unsupervised anomaly detection is sufficient for data cleaning, this is not the case in real-world applications, where accuracy is of the utmost importance. For example, it would be unacceptable to misdiagnose someone as not having breast cancer and not provide them with treatment because our model failed to recognize it as an anomaly. In this paper, we propose an optimized model—IFDRF (Isolation Forest, DBSCAN, and Random Forest) that has incorporated feedback (corrections) into the unsupervised detection model. IFDRF is a novel hybrid model combining an unsupervised learning model at the first layer followed by a clustering model at the second layer and a supervised learning model at the end. The proposed model tunes the unsupervised learning model followed by a model fitting with the help of the feedback mechanism. It obviates the need to label the entire dataset and thus increases the scope of anomaly detection applications. We have compared our proposed model to the existing state-of-the-art anomaly detection baseline models to show its efficacy. The proposed model performed significantly (\\(P{\\text{-value}} < 2.2 \\times {{10}^{{ - 16}}}\\)) better than the other algorithms, with an AUC score of 0.875.","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"33 4","pages":"385 - 400"},"PeriodicalIF":0.8000,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"IFDRF: Advancing Anomaly Detection with a Hybrid Machine Learning Model\",\"authors\":\"Hariharan Ramesh, Faridoddin Shariaty, Sanjiban Sekhar Roy\",\"doi\":\"10.3103/S1060992X24700474\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Anomaly detection is the identification of aberrations in the dataset using statistical methods or machine learning algorithms. It is widely performed using unsupervised learning algorithms because labelling the data manually can be expensive. While unsupervised anomaly detection is sufficient for data cleaning, this is not the case in real-world applications, where accuracy is of the utmost importance. For example, it would be unacceptable to misdiagnose someone as not having breast cancer and not provide them with treatment because our model failed to recognize it as an anomaly. In this paper, we propose an optimized model—IFDRF (Isolation Forest, DBSCAN, and Random Forest) that has incorporated feedback (corrections) into the unsupervised detection model. IFDRF is a novel hybrid model combining an unsupervised learning model at the first layer followed by a clustering model at the second layer and a supervised learning model at the end. The proposed model tunes the unsupervised learning model followed by a model fitting with the help of the feedback mechanism. It obviates the need to label the entire dataset and thus increases the scope of anomaly detection applications. We have compared our proposed model to the existing state-of-the-art anomaly detection baseline models to show its efficacy. The proposed model performed significantly (\\\\(P{\\\\text{-value}} < 2.2 \\\\times {{10}^{{ - 16}}}\\\\)) better than the other algorithms, with an AUC score of 0.875.\",\"PeriodicalId\":721,\"journal\":{\"name\":\"Optical Memory and Neural Networks\",\"volume\":\"33 4\",\"pages\":\"385 - 400\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2025-02-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Optical Memory and Neural Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.3103/S1060992X24700474\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"OPTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optical Memory and Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S1060992X24700474","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OPTICS","Score":null,"Total":0}

引用次数: 0

摘要

异常检测是使用统计方法或机器学习算法识别数据集中的异常。它广泛使用无监督学习算法来执行，因为手动标记数据可能会很昂贵。虽然无监督的异常检测对于数据清理来说已经足够了，但在真实的应用程序中并非如此，因为准确性是最重要的。例如，由于我们的模型未能将其识别为异常，因此误诊某人没有患乳腺癌而不为其提供治疗是不可接受的。在本文中，我们提出了一个优化模型- ifdrf（隔离森林，DBSCAN和随机森林），它将反馈（修正）纳入无监督检测模型。IFDRF是一种新颖的混合模型，第一层是无监督学习模型，第二层是聚类模型，最后是监督学习模型。该模型首先调整无监督学习模型，然后利用反馈机制进行模型拟合。它避免了标记整个数据集的需要，从而增加了异常检测应用的范围。我们将我们提出的模型与现有的最先进的异常检测基线模型进行了比较，以显示其有效性。该模型（\(P{\text{-value}} < 2.2 \times {{10}^{{ - 16}}}\)）显著优于其他算法，AUC得分为0.875。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

IFDRF: Advancing Anomaly Detection with a Hybrid Machine Learning Model

查看原文本刊更多论文

IFDRF: Advancing Anomaly Detection with a Hybrid Machine Learning Model

Anomaly detection is the identification of aberrations in the dataset using statistical methods or machine learning algorithms. It is widely performed using unsupervised learning algorithms because labelling the data manually can be expensive. While unsupervised anomaly detection is sufficient for data cleaning, this is not the case in real-world applications, where accuracy is of the utmost importance. For example, it would be unacceptable to misdiagnose someone as not having breast cancer and not provide them with treatment because our model failed to recognize it as an anomaly. In this paper, we propose an optimized model—IFDRF (Isolation Forest, DBSCAN, and Random Forest) that has incorporated feedback (corrections) into the unsupervised detection model. IFDRF is a novel hybrid model combining an unsupervised learning model at the first layer followed by a clustering model at the second layer and a supervised learning model at the end. The proposed model tunes the unsupervised learning model followed by a model fitting with the help of the feedback mechanism. It obviates the need to label the entire dataset and thus increases the scope of anomaly detection applications. We have compared our proposed model to the existing state-of-the-art anomaly detection baseline models to show its efficacy. The proposed model performed significantly (\(P{\text{-value}} < 2.2 \times {{10}^{{ - 16}}}\)) better than the other algorithms, with an AUC score of 0.875.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Optical Memory and Neural Networks OPTICS-

CiteScore

1.50

自引率

11.10%

发文量

期刊介绍： The journal covers a wide range of issues in information optics such as optical memory, mechanisms for optical data recording and processing, photosensitive materials, optical, optoelectronic and holographic nanostructures, and many other related topics. Papers on memory systems using holographic and biological structures and concepts of brain operation are also included. The journal pays particular attention to research in the field of neural net systems that may lead to a new generation of computional technologies by endowing them with intelligence.