{"title":"K-Plus anticlustering: An improved k-means criterion for maximizing between-group similarity","authors":"Martin Papenberg","doi":"10.1111/bmsp.12315","DOIUrl":null,"url":null,"abstract":"<p>Anticlustering refers to the process of partitioning elements into disjoint groups with the goal of obtaining high between-group similarity and high within-group heterogeneity. Anticlustering thereby reverses the logic of its better known twin—cluster analysis—and is usually approached by maximizing instead of minimizing a clustering objective function. This paper presents <i>k</i>-plus, an extension of the classical <i>k</i>-means objective of maximizing between-group similarity in anticlustering applications. <i>K</i>-plus represents between-group similarity as discrepancy in distribution moments (means, variance, and higher-order moments), whereas the <i>k</i>-means criterion only reflects group differences with regard to means. While constituting a new criterion for anticlustering, it is shown that <i>k</i>-plus anticlustering can be implemented by optimizing the original <i>k</i>-means criterion after the input data have been augmented with additional variables. A computer simulation and practical examples show that <i>k</i>-plus anticlustering achieves high between-group similarity with regard to multiple objectives. In particular, optimizing between-group similarity with regard to variances usually does not compromise similarity with regard to means; the <i>k</i>-plus extension is therefore generally preferred over classical <i>k</i>-means anticlustering. Examples are given on how <i>k</i>-plus anticlustering can be applied to real norming data using the open source R package anticlust, which is freely available via CRAN.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":null,"pages":null},"PeriodicalIF":1.5000,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12315","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Mathematical & Statistical Psychology","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/bmsp.12315","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Anticlustering refers to the process of partitioning elements into disjoint groups with the goal of obtaining high between-group similarity and high within-group heterogeneity. Anticlustering thereby reverses the logic of its better known twin—cluster analysis—and is usually approached by maximizing instead of minimizing a clustering objective function. This paper presents k-plus, an extension of the classical k-means objective of maximizing between-group similarity in anticlustering applications. K-plus represents between-group similarity as discrepancy in distribution moments (means, variance, and higher-order moments), whereas the k-means criterion only reflects group differences with regard to means. While constituting a new criterion for anticlustering, it is shown that k-plus anticlustering can be implemented by optimizing the original k-means criterion after the input data have been augmented with additional variables. A computer simulation and practical examples show that k-plus anticlustering achieves high between-group similarity with regard to multiple objectives. In particular, optimizing between-group similarity with regard to variances usually does not compromise similarity with regard to means; the k-plus extension is therefore generally preferred over classical k-means anticlustering. Examples are given on how k-plus anticlustering can be applied to real norming data using the open source R package anticlust, which is freely available via CRAN.
期刊介绍:
The British Journal of Mathematical and Statistical Psychology publishes articles relating to areas of psychology which have a greater mathematical or statistical aspect of their argument than is usually acceptable to other journals including:
• mathematical psychology
• statistics
• psychometrics
• decision making
• psychophysics
• classification
• relevant areas of mathematics, computing and computer software
These include articles that address substantitive psychological issues or that develop and extend techniques useful to psychologists. New models for psychological processes, new approaches to existing data, critiques of existing models and improved algorithms for estimating the parameters of a model are examples of articles which may be favoured.