A New Procedure of Clustering Based on Multivariate Outlier Detection

Grégory David, S. Jayakumar, B. Thomas
{"title":"A New Procedure of Clustering Based on Multivariate Outlier Detection","authors":"Grégory David, S. Jayakumar, B. Thomas","doi":"10.6339/JDS.2013.11(1).1091","DOIUrl":null,"url":null,"abstract":"Clustering is an extremely important task in a wide variety of ap- plication domains especially in management and social science research. In this paper, an iterative procedure of clustering method based on multivariate outlier detection was proposed by using the famous Mahalanobis distance. At rst, Mahalanobis distance should be calculated for the entire sample, then using T 2 -statistic x a UCL. Above the UCL are treated as outliers which are grouped as outlier cluster and repeat the same procedure for the remaining inliers, until the variance-covariance matrix for the variables in the last cluster achieved singularity. At each iteration, multivariate test of mean used to check the discrimination between the outlier clusters and the inliers. Moreover, multivariate control charts also used to graphically visual- izes the iterations and outlier clustering process. Finally multivariate test of means helps to rmly establish the cluster discrimination and validity. This paper employed this procedure for clustering 275 customers of a famous two- wheeler in India based on 19 dierent attributes of the two wheeler and its company. The result of the proposed technique conrms there exist 5 and 7 outlier clusters of customers in the entire sample at 5% and 1% signicance level respectively.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of data science : JDS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.6339/JDS.2013.11(1).1091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

Abstract

Clustering is an extremely important task in a wide variety of ap- plication domains especially in management and social science research. In this paper, an iterative procedure of clustering method based on multivariate outlier detection was proposed by using the famous Mahalanobis distance. At rst, Mahalanobis distance should be calculated for the entire sample, then using T 2 -statistic x a UCL. Above the UCL are treated as outliers which are grouped as outlier cluster and repeat the same procedure for the remaining inliers, until the variance-covariance matrix for the variables in the last cluster achieved singularity. At each iteration, multivariate test of mean used to check the discrimination between the outlier clusters and the inliers. Moreover, multivariate control charts also used to graphically visual- izes the iterations and outlier clustering process. Finally multivariate test of means helps to rmly establish the cluster discrimination and validity. This paper employed this procedure for clustering 275 customers of a famous two- wheeler in India based on 19 dierent attributes of the two wheeler and its company. The result of the proposed technique conrms there exist 5 and 7 outlier clusters of customers in the entire sample at 5% and 1% signicance level respectively.
一种基于多变量异常值检测的聚类新方法
聚类在各种应用领域中是一项极其重要的任务,尤其是在管理和社会科学研究中。本文利用著名的马氏距离,提出了一种基于多元异常点检测的聚类方法的迭代过程。首先,应该计算整个样本的马氏距离,然后使用T2统计量x UCL。以上UCL被视为异常值,这些异常值被分组为异常值聚类,并对其余的异常值重复相同的过程,直到最后一个聚类中变量的方差-协方差矩阵达到奇异性。在每次迭代中,使用均值的多元检验来检查异常聚类和内部聚类之间的区别。此外,多元控制图还用于图形化可视化迭代和异常值聚类过程。最后,多元均值检验有助于rmly建立聚类判别和有效性。本文采用该程序,基于印度一辆著名两轮车及其公司的19个特征,对275名客户进行了聚类。所提出的技术的结果表明,在整个样本中,在5%和1%的显著水平上,分别存在5个和7个异常客户集群。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信