A fresh look at mean-shift based modal clustering

IF 1.3 4区计算机科学 Q2 STATISTICS & PROBABILITY

Advances in Data Analysis and Classification Pub Date : 2023-12-14 DOI:10.1007/s11634-023-00575-1

Jose Ameijeiras-Alonso, Jochen Einbeck

{"title":"A fresh look at mean-shift based modal clustering","authors":"Jose Ameijeiras-Alonso, Jochen Einbeck","doi":"10.1007/s11634-023-00575-1","DOIUrl":null,"url":null,"abstract":"<div><p>Modal clustering is an unsupervised learning technique where cluster centers are identified as the local maxima of nonparametric probability density estimates. A natural algorithmic engine for the computation of these maxima is the <i>mean shift procedure</i>, which is essentially an iteratively computed chain of local means. We revisit this technique, focusing on its link to kernel density gradient estimation, in this course proposing a novel concept for bandwidth selection based on the concept of a critical bandwidth. Furthermore, in the one-dimensional case, an inverse version of the mean shift is developed to provide a novel approach for the estimation of antimodes, which is then used to identify cluster boundaries. A simulation study is provided which assesses, in the univariate case, the classification accuracy of the mean-shift based clustering approach. Three (univariate and multivariate) examples from the fields of philately, engineering, and imaging, illustrate how modal clusterings identified through mean shift based methods relate directly and naturally to physical properties of the data-generating system. Solutions are proposed to deal computationally efficiently with large data sets.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 4","pages":"1067 - 1095"},"PeriodicalIF":1.3000,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Data Analysis and Classification","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s11634-023-00575-1","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

Modal clustering is an unsupervised learning technique where cluster centers are identified as the local maxima of nonparametric probability density estimates. A natural algorithmic engine for the computation of these maxima is the mean shift procedure, which is essentially an iteratively computed chain of local means. We revisit this technique, focusing on its link to kernel density gradient estimation, in this course proposing a novel concept for bandwidth selection based on the concept of a critical bandwidth. Furthermore, in the one-dimensional case, an inverse version of the mean shift is developed to provide a novel approach for the estimation of antimodes, which is then used to identify cluster boundaries. A simulation study is provided which assesses, in the univariate case, the classification accuracy of the mean-shift based clustering approach. Three (univariate and multivariate) examples from the fields of philately, engineering, and imaging, illustrate how modal clusterings identified through mean shift based methods relate directly and naturally to physical properties of the data-generating system. Solutions are proposed to deal computationally efficiently with large data sets.

Abstract Image

查看原文本刊更多论文

重新审视基于均值移动的模态聚类

模态聚类是一种无监督学习技术，聚类中心被识别为非参数概率密度估计的局部最大值。计算这些最大值的自然算法引擎是均值移动程序，它本质上是一个迭代计算的局部均值链。在本课程中，我们重温了这一技术，重点关注其与核密度梯度估计的联系，并根据临界带宽的概念提出了带宽选择的新概念。此外，在一维情况下，还开发了均值移动的逆版本，为估计反节点提供了一种新方法，然后用于识别聚类边界。模拟研究评估了基于均值偏移的聚类方法在单变量情况下的分类准确性。来自集邮、工程和成像领域的三个（单变量和多变量）实例说明了通过基于均值偏移的方法确定的模态聚类如何直接、自然地与数据生成系统的物理特性相关联。此外，还提出了高效计算大型数据集的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Advances in Data Analysis and Classification STATISTICS & PROBABILITY-

CiteScore

3.40

自引率

6.20%

发文量

审稿时长

>12 weeks

期刊介绍： The international journal Advances in Data Analysis and Classification (ADAC) is designed as a forum for high standard publications on research and applications concerning the extraction of knowable aspects from many types of data. It publishes articles on such topics as structural, quantitative, or statistical approaches for the analysis of data; advances in classification, clustering, and pattern recognition methods; strategies for modeling complex data and mining large data sets; methods for the extraction of knowledge from data, and applications of advanced methods in specific domains of practice. Articles illustrate how new domain-specific knowledge can be made available from data by skillful use of data analysis methods. The journal also publishes survey papers that outline, and illuminate the basic ideas and techniques of special approaches.