Sharifah Sakinah Syed Abd Mutalib, Siti Zanariah Satari, Wan Nur Syahidah Wan Yusoff
{"title":"A New Single Linkage Robust Clustering Outlier Detection Procedures for Multivarite Data","authors":"Sharifah Sakinah Syed Abd Mutalib, Siti Zanariah Satari, Wan Nur Syahidah Wan Yusoff","doi":"10.17576/jsm-2023-5208-19","DOIUrl":null,"url":null,"abstract":"Outliers are abnormal data, and the detection of outliers in multivariate data has always been of interest. Unlike univariate data, outlier detection for multivariate data is insufficient with a visual inspection. In this study, we developed a new single linkage robust clustering outlier detection procedure for multivariate data. A robust estimator, Test on Covariance (TOC) is used to robustified the similarity distance measure, producing robust single linkage clustering. The performance of the new single linkage robust clustering outlier detection procedure is investigated via a simulation study using three outlier scenarios and historical multivariate datasets as illustrative examples. Three performance measures are used, which are pout, pmask, and pswamp. The performance of the new single linkage robust clustering procedure also compared with single linkage clustering using Euclidean and Mahalanobis distances as similarity distance measures as well as TOC. It is found that the new single linkage robust clustering procedure performs well in Outlier Scenario 3 when the mean and covariance matrix are shifted. The new procedure also performs well by successfully detecting all outliers, does not have masking effects in two out of five datasets and does not have swamping effect in all datasets. In conclusion, the new single linkage robust clustering outlier detection procedure is a practical and promising approach and good for simultaneously identifying multiple outliers in multivariate data.","PeriodicalId":21366,"journal":{"name":"Sains Malaysiana","volume":"10 1","pages":"0"},"PeriodicalIF":0.7000,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sains Malaysiana","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17576/jsm-2023-5208-19","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Outliers are abnormal data, and the detection of outliers in multivariate data has always been of interest. Unlike univariate data, outlier detection for multivariate data is insufficient with a visual inspection. In this study, we developed a new single linkage robust clustering outlier detection procedure for multivariate data. A robust estimator, Test on Covariance (TOC) is used to robustified the similarity distance measure, producing robust single linkage clustering. The performance of the new single linkage robust clustering outlier detection procedure is investigated via a simulation study using three outlier scenarios and historical multivariate datasets as illustrative examples. Three performance measures are used, which are pout, pmask, and pswamp. The performance of the new single linkage robust clustering procedure also compared with single linkage clustering using Euclidean and Mahalanobis distances as similarity distance measures as well as TOC. It is found that the new single linkage robust clustering procedure performs well in Outlier Scenario 3 when the mean and covariance matrix are shifted. The new procedure also performs well by successfully detecting all outliers, does not have masking effects in two out of five datasets and does not have swamping effect in all datasets. In conclusion, the new single linkage robust clustering outlier detection procedure is a practical and promising approach and good for simultaneously identifying multiple outliers in multivariate data.
异常值是异常数据,多变量数据中异常值的检测一直是人们感兴趣的问题。与单变量数据不同,单变量数据的异常值检测是不够的。在这项研究中,我们开发了一种新的单链接鲁棒聚类异常值检测程序,用于多变量数据。采用协方差检验(Test on Covariance, TOC)鲁棒估计方法对相似性距离测度进行鲁棒化,产生鲁棒的单链接聚类。通过对三种异常点场景和历史多元数据集的仿真研究,研究了新的单链接鲁棒聚类异常点检测方法的性能。使用了三种性能度量,分别是pout、pmask和pswamp。并将该方法与欧几里得距离和马氏距离作为相似距离度量以及TOC的单连杆鲁棒聚类方法进行了比较。结果表明,当均值和协方差矩阵发生移位时,新的单链接鲁棒聚类方法在Outlier场景3中表现良好。新方法还通过成功检测所有异常值而表现良好,在五个数据集中的两个数据集中没有屏蔽效应,并且在所有数据集中都没有淹没效应。总之,新的单链接鲁棒聚类异常点检测方法是一种实用且有前途的方法,可以同时识别多变量数据中的多个异常点。
期刊介绍:
Sains Malaysiana is a refereed journal committed to the advancement of scholarly knowledge and research findings of the several branches of science and technology. It contains articles on Earth Sciences, Health Sciences, Life Sciences, Mathematical Sciences and Physical Sciences. The journal publishes articles, reviews, and research notes whose content and approach are of interest to a wide range of scholars. Sains Malaysiana is published by the UKM Press an its autonomous Editorial Board are drawn from the Faculty of Science and Technology, Universiti Kebangsaan Malaysia. In addition, distinguished scholars from local and foreign universities are appointed to serve as advisory board members and referees.