A New Single Linkage Robust Clustering Outlier Detection Procedures for Multivarite Data

IF 0.7 4区 综合性期刊 Q3 MULTIDISCIPLINARY SCIENCES
Sharifah Sakinah Syed Abd Mutalib, Siti Zanariah Satari, Wan Nur Syahidah Wan Yusoff
{"title":"A New Single Linkage Robust Clustering Outlier Detection Procedures for Multivarite Data","authors":"Sharifah Sakinah Syed Abd Mutalib, Siti Zanariah Satari, Wan Nur Syahidah Wan Yusoff","doi":"10.17576/jsm-2023-5208-19","DOIUrl":null,"url":null,"abstract":"Outliers are abnormal data, and the detection of outliers in multivariate data has always been of interest. Unlike univariate data, outlier detection for multivariate data is insufficient with a visual inspection. In this study, we developed a new single linkage robust clustering outlier detection procedure for multivariate data. A robust estimator, Test on Covariance (TOC) is used to robustified the similarity distance measure, producing robust single linkage clustering. The performance of the new single linkage robust clustering outlier detection procedure is investigated via a simulation study using three outlier scenarios and historical multivariate datasets as illustrative examples. Three performance measures are used, which are pout, pmask, and pswamp. The performance of the new single linkage robust clustering procedure also compared with single linkage clustering using Euclidean and Mahalanobis distances as similarity distance measures as well as TOC. It is found that the new single linkage robust clustering procedure performs well in Outlier Scenario 3 when the mean and covariance matrix are shifted. The new procedure also performs well by successfully detecting all outliers, does not have masking effects in two out of five datasets and does not have swamping effect in all datasets. In conclusion, the new single linkage robust clustering outlier detection procedure is a practical and promising approach and good for simultaneously identifying multiple outliers in multivariate data.","PeriodicalId":21366,"journal":{"name":"Sains Malaysiana","volume":"10 1","pages":"0"},"PeriodicalIF":0.7000,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sains Malaysiana","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17576/jsm-2023-5208-19","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Outliers are abnormal data, and the detection of outliers in multivariate data has always been of interest. Unlike univariate data, outlier detection for multivariate data is insufficient with a visual inspection. In this study, we developed a new single linkage robust clustering outlier detection procedure for multivariate data. A robust estimator, Test on Covariance (TOC) is used to robustified the similarity distance measure, producing robust single linkage clustering. The performance of the new single linkage robust clustering outlier detection procedure is investigated via a simulation study using three outlier scenarios and historical multivariate datasets as illustrative examples. Three performance measures are used, which are pout, pmask, and pswamp. The performance of the new single linkage robust clustering procedure also compared with single linkage clustering using Euclidean and Mahalanobis distances as similarity distance measures as well as TOC. It is found that the new single linkage robust clustering procedure performs well in Outlier Scenario 3 when the mean and covariance matrix are shifted. The new procedure also performs well by successfully detecting all outliers, does not have masking effects in two out of five datasets and does not have swamping effect in all datasets. In conclusion, the new single linkage robust clustering outlier detection procedure is a practical and promising approach and good for simultaneously identifying multiple outliers in multivariate data.
一种新的多变量数据单链接鲁棒聚类离群点检测方法
异常值是异常数据,多变量数据中异常值的检测一直是人们感兴趣的问题。与单变量数据不同,单变量数据的异常值检测是不够的。在这项研究中,我们开发了一种新的单链接鲁棒聚类异常值检测程序,用于多变量数据。采用协方差检验(Test on Covariance, TOC)鲁棒估计方法对相似性距离测度进行鲁棒化,产生鲁棒的单链接聚类。通过对三种异常点场景和历史多元数据集的仿真研究,研究了新的单链接鲁棒聚类异常点检测方法的性能。使用了三种性能度量,分别是pout、pmask和pswamp。并将该方法与欧几里得距离和马氏距离作为相似距离度量以及TOC的单连杆鲁棒聚类方法进行了比较。结果表明,当均值和协方差矩阵发生移位时,新的单链接鲁棒聚类方法在Outlier场景3中表现良好。新方法还通过成功检测所有异常值而表现良好,在五个数据集中的两个数据集中没有屏蔽效应,并且在所有数据集中都没有淹没效应。总之,新的单链接鲁棒聚类异常点检测方法是一种实用且有前途的方法,可以同时识别多变量数据中的多个异常点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Sains Malaysiana
Sains Malaysiana MULTIDISCIPLINARY SCIENCES-
CiteScore
1.60
自引率
12.50%
发文量
196
审稿时长
3-6 weeks
期刊介绍: Sains Malaysiana is a refereed journal committed to the advancement of scholarly knowledge and research findings of the several branches of science and technology. It contains articles on Earth Sciences, Health Sciences, Life Sciences, Mathematical Sciences and Physical Sciences. The journal publishes articles, reviews, and research notes whose content and approach are of interest to a wide range of scholars. Sains Malaysiana is published by the UKM Press an its autonomous Editorial Board are drawn from the Faculty of Science and Technology, Universiti Kebangsaan Malaysia. In addition, distinguished scholars from local and foreign universities are appointed to serve as advisory board members and referees.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信