Detecting Outliers Using Modified Recursive PCA Algorithm For Dynamic Streaming Data

Mendel Pub Date : 2023-12-20 DOI:10.13164/mendel.2023.2.237
Yasi Dani, Agus Yodi Gunawan, M. L. Khodra, S. Indratno
{"title":"Detecting Outliers Using Modified Recursive PCA Algorithm For Dynamic Streaming Data","authors":"Yasi Dani, Agus Yodi Gunawan, M. L. Khodra, S. Indratno","doi":"10.13164/mendel.2023.2.237","DOIUrl":null,"url":null,"abstract":"Outlier analysis has been widely studied and has produced many methods. However, there is still rare a method to detect outliers for dynamically streaming batch data (online learning). In the present research, a novel online algorithm to detect outliers in such dataset is proposed. Data points are proceeded by applying a modified recursive PCA to predict sequentially parameters of the model; eigenvalues and eigenvectors of the statistical detection model are recursively updated using approximate values by perturbation methods. More specifically, the recursive eigenstructure is obtained from the derivation of the covariance matrix using the first-order perturbation technique. The Mahalanobis distance is then used as an outlier score. Our algorithm performances are evaluated using some metrics, namely accuration, precision, recall, F1-score, AUC-PR, and the execution time. Results show that the proposed online outlier detection is computationally efficient in time and the algorithm's performance effectiveness is comparable to that of the offline outlier detection algorithm via classical PCA.","PeriodicalId":38293,"journal":{"name":"Mendel","volume":"129 47","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mendel","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.13164/mendel.2023.2.237","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Outlier analysis has been widely studied and has produced many methods. However, there is still rare a method to detect outliers for dynamically streaming batch data (online learning). In the present research, a novel online algorithm to detect outliers in such dataset is proposed. Data points are proceeded by applying a modified recursive PCA to predict sequentially parameters of the model; eigenvalues and eigenvectors of the statistical detection model are recursively updated using approximate values by perturbation methods. More specifically, the recursive eigenstructure is obtained from the derivation of the covariance matrix using the first-order perturbation technique. The Mahalanobis distance is then used as an outlier score. Our algorithm performances are evaluated using some metrics, namely accuration, precision, recall, F1-score, AUC-PR, and the execution time. Results show that the proposed online outlier detection is computationally efficient in time and the algorithm's performance effectiveness is comparable to that of the offline outlier detection algorithm via classical PCA.
针对动态流数据使用修正递归 PCA 算法检测异常值
离群值分析已被广泛研究,并产生了许多方法。然而,针对动态流批量数据(在线学习)的离群值检测方法还很少见。本研究提出了一种新颖的在线算法来检测此类数据集中的离群值。数据点通过应用改进的递归 PCA 来预测模型的顺序参数;统计检测模型的特征值和特征向量通过扰动方法使用近似值进行递归更新。更具体地说,递归特征结构是利用一阶扰动技术从协方差矩阵的推导中获得的。然后使用 Mahalanobis 距离作为离群值。我们使用一些指标对算法的性能进行了评估,即精度、召回率、F1-分数、AUC-PR 和执行时间。结果表明,所提出的在线离群点检测在时间上计算效率高,而且算法的性能效果与通过经典 PCA 算法进行离线离群点检测的效果相当。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Mendel
Mendel Decision Sciences-Decision Sciences (miscellaneous)
CiteScore
2.20
自引率
0.00%
发文量
7
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信