{"title":"A Tensor-EM Method for Large-Scale Latent Class Analysis with Binary Responses.","authors":"Zhenghao Zeng, Yuqi Gu, Gongjun Xu","doi":"10.1007/s11336-022-09887-1","DOIUrl":null,"url":null,"abstract":"<p><p>Latent class models are powerful statistical modeling tools widely used in psychological, behavioral, and social sciences. In the modern era of data science, researchers often have access to response data collected from large-scale surveys or assessments, featuring many items (large J) and many subjects (large N). This is in contrary to the traditional regime with fixed J and large N. To analyze such large-scale data, it is important to develop methods that are both computationally efficient and theoretically valid. In terms of computation, the conventional EM algorithm for latent class models tends to have a slow algorithmic convergence rate for large-scale data and may converge to some local optima instead of the maximum likelihood estimator (MLE). Motivated by this, we introduce the tensor decomposition perspective into latent class analysis with binary responses. Methodologically, we propose to use a moment-based tensor power method in the first step and then use the obtained estimates as initialization for the EM algorithm in the second step. Theoretically, we establish the clustering consistency of the MLE in assigning subjects into latent classes when N and J both go to infinity. Simulation studies suggest that the proposed tensor-EM pipeline enjoys both good accuracy and computational efficiency for large-scale data with binary responses. We also apply the proposed method to an educational assessment dataset as an illustration.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":"88 2","pages":"580-612"},"PeriodicalIF":2.9000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychometrika","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1007/s11336-022-09887-1","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/10/1 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Latent class models are powerful statistical modeling tools widely used in psychological, behavioral, and social sciences. In the modern era of data science, researchers often have access to response data collected from large-scale surveys or assessments, featuring many items (large J) and many subjects (large N). This is in contrary to the traditional regime with fixed J and large N. To analyze such large-scale data, it is important to develop methods that are both computationally efficient and theoretically valid. In terms of computation, the conventional EM algorithm for latent class models tends to have a slow algorithmic convergence rate for large-scale data and may converge to some local optima instead of the maximum likelihood estimator (MLE). Motivated by this, we introduce the tensor decomposition perspective into latent class analysis with binary responses. Methodologically, we propose to use a moment-based tensor power method in the first step and then use the obtained estimates as initialization for the EM algorithm in the second step. Theoretically, we establish the clustering consistency of the MLE in assigning subjects into latent classes when N and J both go to infinity. Simulation studies suggest that the proposed tensor-EM pipeline enjoys both good accuracy and computational efficiency for large-scale data with binary responses. We also apply the proposed method to an educational assessment dataset as an illustration.
潜类模型是一种强大的统计建模工具,广泛应用于心理学、行为学和社会科学领域。在现代数据科学时代,研究人员经常可以访问从大规模调查或评估中收集的响应数据,这些数据具有项目多(J 大)、受试者多(N 大)的特点。要分析此类大规模数据,必须开发出既有计算效率又有理论依据的方法。在计算方面,传统的潜类模型 EM 算法对于大规模数据的算法收敛速度往往很慢,而且可能会收敛到一些局部最优值,而不是最大似然估计值(MLE)。受此启发,我们将张量分解视角引入二元响应的潜类分析中。在方法上,我们建议在第一步使用基于矩的张量幂方法,然后在第二步将获得的估计值作为 EM 算法的初始化。从理论上讲,当 N 和 J 均为无穷大时,我们确定了 MLE 在将受试者分配到潜类时的聚类一致性。仿真研究表明,对于具有二元响应的大规模数据,所提出的张量-EM 管道具有良好的准确性和计算效率。我们还将提出的方法应用于一个教育评估数据集,以作说明。
期刊介绍:
The journal Psychometrika is devoted to the advancement of theory and methodology for behavioral data in psychology, education and the social and behavioral sciences generally. Its coverage is offered in two sections: Theory and Methods (T& M), and Application Reviews and Case Studies (ARCS). T&M articles present original research and reviews on the development of quantitative models, statistical methods, and mathematical techniques for evaluating data from psychology, the social and behavioral sciences and related fields. Application Reviews can be integrative, drawing together disparate methodologies for applications, or comparative and evaluative, discussing advantages and disadvantages of one or more methodologies in applications. Case Studies highlight methodology that deepens understanding of substantive phenomena through more informative data analysis, or more elegant data description.