A robust distance-based approach for detecting multidimensional outliers.

IF 1.2 4区 数学 Q2 STATISTICS & PROBABILITY
Journal of Applied Statistics Pub Date : 2024-11-07 eCollection Date: 2025-01-01 DOI:10.1080/02664763.2024.2422403
R Lakshmi, T A Sajesh
{"title":"A robust distance-based approach for detecting multidimensional outliers.","authors":"R Lakshmi, T A Sajesh","doi":"10.1080/02664763.2024.2422403","DOIUrl":null,"url":null,"abstract":"<p><p>Identifying outliers in data analysis is a critical task, as outliers can significantly influence the results and conclusions drawn from a dataset. This study explores the use of the Mahalanobis distance metric for detecting outliers in multivariate data, focusing on a novel approach inspired by the work of M. Falk, [<i>On mad and comedians</i>, Ann. Inst. Stat. Math. 49 (1997), pp. 615-644]. The proposed method is rigorously tested through extensive simulation analysis, where it demonstrates high True Positive Rates (TPR) and low False Positive Rates (FPR) when compared to other existing outlier detection techniques. Through extensive simulation analysis, we empirically evaluate the affine equivariance and breakdown properties of our proposed distance measure and it is evident from the outputs that our robust distance measure demonstrates effective results with respect to the measures FPR and TPR. The proposed method was applied to seven different datasets, showing promising true positive rates (TPR) and false positive rates (FPR), and it outperformed several well-known outlier identification approaches. We can effectively use our proposed distance measure in fields demanding outlier detection.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 6","pages":"1278-1298"},"PeriodicalIF":1.2000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12035934/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1080/02664763.2024.2422403","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

Identifying outliers in data analysis is a critical task, as outliers can significantly influence the results and conclusions drawn from a dataset. This study explores the use of the Mahalanobis distance metric for detecting outliers in multivariate data, focusing on a novel approach inspired by the work of M. Falk, [On mad and comedians, Ann. Inst. Stat. Math. 49 (1997), pp. 615-644]. The proposed method is rigorously tested through extensive simulation analysis, where it demonstrates high True Positive Rates (TPR) and low False Positive Rates (FPR) when compared to other existing outlier detection techniques. Through extensive simulation analysis, we empirically evaluate the affine equivariance and breakdown properties of our proposed distance measure and it is evident from the outputs that our robust distance measure demonstrates effective results with respect to the measures FPR and TPR. The proposed method was applied to seven different datasets, showing promising true positive rates (TPR) and false positive rates (FPR), and it outperformed several well-known outlier identification approaches. We can effectively use our proposed distance measure in fields demanding outlier detection.

一种基于距离的检测多维异常值的鲁棒方法。
识别数据分析中的异常值是一项关键任务,因为异常值可以显著影响从数据集得出的结果和结论。本研究探索了马氏距离度量在多变量数据中检测异常值的使用,重点是受M. Falk的工作启发的一种新方法,[关于疯子和喜剧演员,安。《统计数学》49(1997),第615-644页。所提出的方法通过广泛的仿真分析进行了严格的测试,与其他现有的异常值检测技术相比,它具有高的真阳性率(TPR)和低的假阳性率(FPR)。通过广泛的模拟分析,我们经验地评估了我们提出的距离测量的仿射等变性和击穿特性,从输出中可以明显看出,我们的鲁棒距离测量在FPR和TPR方面显示了有效的结果。该方法应用于7个不同的数据集,显示出良好的真阳性率(TPR)和假阳性率(FPR),并且优于几种已知的离群值识别方法。我们可以在需要异常值检测的领域有效地使用我们提出的距离度量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Applied Statistics
Journal of Applied Statistics 数学-统计学与概率论
CiteScore
3.40
自引率
0.00%
发文量
126
审稿时长
6 months
期刊介绍: Journal of Applied Statistics provides a forum for communication between both applied statisticians and users of applied statistical techniques across a wide range of disciplines. These areas include business, computing, economics, ecology, education, management, medicine, operational research and sociology, but papers from other areas are also considered. The editorial policy is to publish rigorous but clear and accessible papers on applied techniques. Purely theoretical papers are avoided but those on theoretical developments which clearly demonstrate significant applied potential are welcomed. Each paper is submitted to at least two independent referees.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信