How to Use Model-Based Cluster Analysis Efficiently in Person-Oriented Research.

Q2 Psychology
Journal for Person-Oriented Research Pub Date : 2021-08-26 eCollection Date: 2021-01-01 DOI:10.17505/jpor.2021.23449
Bence Gergely, András Vargha
{"title":"How to Use Model-Based Cluster Analysis Efficiently in Person-Oriented Research.","authors":"Bence Gergely,&nbsp;András Vargha","doi":"10.17505/jpor.2021.23449","DOIUrl":null,"url":null,"abstract":"<p><p>Model-based cluster analysis (MBCA) was created to automatize the often subjective model-selection procedure of traditional explorative clustering methods. It is a type of finite mixture modelling, assuming that the data come from a mixture of different subpopulations following given distributions, typically multivariate normal. In that case cluster analysis is the exploration of the underlying mixture structure. In MBCA finding the possible number of clusters and the best clustering model is a statistical model-selection problem, where the models with differing number and type of component distributions are compared. For fitting a certain model MBCA uses a likelihood based Bayesian Information Criterion (BIC) to evaluate its appropriateness and the model with the highest BIC value is accepted as the final solution. The aim of the present study is to investigate the adequacy of automatic model selection in MBCA using BIC, and suggested alternative methods, like the Integrated Completed Likelihood Criterion (ICL), or Baudry's method. An additional aim is to refine these procedures by using so called quality coefficients (QCs), borrowed from methodological advances within the field of exploratory cluster analysis, to help in the choice of an appropriate cluster structure (CLS), and also to compare the efficiency of MBCA in identifying a theoretical CLS with those of various other clustering methods. The analyses are restricted to studying the performance of various procedures of the type described above for two classification situations, typical in person-oriented studies: (1) an example data set characterized by a perfect theoretical CLS with seven types (seven completely homogeneous clusters) was used to generate three data sets with varying degrees of measurement error added to the original values, and (2) three additional data sets based on another perfect theoretical CLS with four types. It was found that the automatic decision rarely led to an optimal solution. However, dropping solutions with irregular BIC curves, and using different QCs as an aid in choosing between different solutions generated by MBCA and by fusing close clusters, optimal solutions were achieved for the two classification situations studied. With this refined procedure the revealed cluster solutions of MBCA often proved to be at least as good as those of different hierarchical and <i>k</i>-center clustering methods. MBCA was definitely superior in identifying four-type CLS models. In identifying seven-type CLS models MBCA performed at a similar level as the best of other clustering methods (such as <i>k</i>-means) only when the reliability level of the input variables was high or moderate, otherwise it was slightly less efficient.</p>","PeriodicalId":36744,"journal":{"name":"Journal for Person-Oriented Research","volume":"7 1","pages":"22-35"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8411881/pdf/","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal for Person-Oriented Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17505/jpor.2021.23449","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"Psychology","Score":null,"Total":0}
引用次数: 1

Abstract

Model-based cluster analysis (MBCA) was created to automatize the often subjective model-selection procedure of traditional explorative clustering methods. It is a type of finite mixture modelling, assuming that the data come from a mixture of different subpopulations following given distributions, typically multivariate normal. In that case cluster analysis is the exploration of the underlying mixture structure. In MBCA finding the possible number of clusters and the best clustering model is a statistical model-selection problem, where the models with differing number and type of component distributions are compared. For fitting a certain model MBCA uses a likelihood based Bayesian Information Criterion (BIC) to evaluate its appropriateness and the model with the highest BIC value is accepted as the final solution. The aim of the present study is to investigate the adequacy of automatic model selection in MBCA using BIC, and suggested alternative methods, like the Integrated Completed Likelihood Criterion (ICL), or Baudry's method. An additional aim is to refine these procedures by using so called quality coefficients (QCs), borrowed from methodological advances within the field of exploratory cluster analysis, to help in the choice of an appropriate cluster structure (CLS), and also to compare the efficiency of MBCA in identifying a theoretical CLS with those of various other clustering methods. The analyses are restricted to studying the performance of various procedures of the type described above for two classification situations, typical in person-oriented studies: (1) an example data set characterized by a perfect theoretical CLS with seven types (seven completely homogeneous clusters) was used to generate three data sets with varying degrees of measurement error added to the original values, and (2) three additional data sets based on another perfect theoretical CLS with four types. It was found that the automatic decision rarely led to an optimal solution. However, dropping solutions with irregular BIC curves, and using different QCs as an aid in choosing between different solutions generated by MBCA and by fusing close clusters, optimal solutions were achieved for the two classification situations studied. With this refined procedure the revealed cluster solutions of MBCA often proved to be at least as good as those of different hierarchical and k-center clustering methods. MBCA was definitely superior in identifying four-type CLS models. In identifying seven-type CLS models MBCA performed at a similar level as the best of other clustering methods (such as k-means) only when the reliability level of the input variables was high or moderate, otherwise it was slightly less efficient.

Abstract Image

Abstract Image

Abstract Image

如何在面向人的研究中有效地使用基于模型的聚类分析。
基于模型的聚类分析(MBCA)是为了使传统的探索性聚类方法的主观模型选择过程自动化而产生的。这是一种有限混合模型,假设数据来自给定分布(通常是多元正态分布)下的不同子种群的混合。在这种情况下,聚类分析是对潜在混合结构的探索。在MBCA中,寻找可能的聚类数量和最佳聚类模型是一个统计模型选择问题,其中比较具有不同数量和类型的成分分布的模型。对于拟合某个模型,MBCA使用基于似然的贝叶斯信息准则(BIC)来评价模型的适宜性,BIC值最高的模型被接受为最终解。本研究的目的是调查使用BIC的MBCA自动模型选择的充分性,并建议替代方法,如综合完成似然标准(ICL)或Baudry的方法。另一个目的是通过使用所谓的质量系数(qc)来完善这些程序,这些系数借鉴了探索性聚类分析领域的方法进步,以帮助选择合适的聚类结构(CLS),并将MBCA在识别理论CLS方面的效率与其他各种聚类方法进行比较。分析仅限于研究上述两种分类情况下各种程序的性能,这是典型的以人为本的研究:(1)以一个具有7种类型的完美理论CLS为特征的示例数据集(7个完全同质聚类)在原始值的基础上生成3个不同程度测量误差的数据集;(2)在另一个具有4种类型的完美理论CLS基础上生成3个附加数据集。结果发现,自动决策很少能得到最优解。然而,对于所研究的两种分类情况,丢弃具有不规则BIC曲线的解,并使用不同qc作为MBCA生成的不同解之间的选择辅助,通过融合紧密聚类,获得了最优解。通过这个改进的过程,MBCA的聚类解通常被证明至少与不同层次和k中心聚类方法的聚类解一样好。MBCA在识别四种CLS模型方面具有明显优势。在识别七种类型的CLS模型时,只有当输入变量的可靠性水平为高或中等时,MBCA的表现与其他最佳聚类方法(如k-means)相似,否则它的效率略低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal for Person-Oriented Research
Journal for Person-Oriented Research Psychology-Psychology (miscellaneous)
CiteScore
2.90
自引率
0.00%
发文量
9
审稿时长
23 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信