Correcting for Sampling Error in between-Cluster Effects: An Empirical Bayes Cluster-Mean Approach with Finite Population Corrections.

IF 5.3 3区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Multivariate Behavioral Research Pub Date : 2024-05-01 Epub Date: 2024-02-13 DOI:10.1080/00273171.2024.2307034

Mark H C Lai, Yichi Zhang, Feng Ji

{"title":"Correcting for Sampling Error in between-Cluster Effects: An Empirical Bayes Cluster-Mean Approach with Finite Population Corrections.","authors":"Mark H C Lai, Yichi Zhang, Feng Ji","doi":"10.1080/00273171.2024.2307034","DOIUrl":null,"url":null,"abstract":"<p><p>With clustered data, such as where students are nested within schools or employees are nested within organizations, it is often of interest to estimate and compare associations among variables separately for each level. While researchers routinely estimate between-cluster effects using the sample cluster means of a predictor, previous research has shown that such practice leads to biased estimates of coefficients at the between level, and recent research has recommended the use of latent cluster means with the multilevel structural equation modeling framework. However, the latent cluster mean approach may not always be the best choice as it (a) relies on the assumption that the population cluster sizes are close to infinite, (b) requires a relatively large number of clusters, and (c) is currently only implemented in specialized software such as Mplus. In this paper, we show how using empirical Bayes estimates of the cluster means can also lead to consistent estimates of between-level coefficients, and illustrate how the empirical Bayes estimate can incorporate finite population corrections when information on population cluster sizes is available. Through a series of Monte Carlo simulation studies, we show that the empirical Bayes cluster-mean approach performs similarly to the latent cluster mean approach for estimating the between-cluster coefficients in most conditions when the infinite-population assumption holds, and applying the finite population correction provides reasonable point and interval estimates when the population is finite. The performance of EBM can be further improved with restricted maximum likelihood estimation and likelihood-based confidence intervals. We also provide an R function that implements the empirical Bayes cluster-mean approach, and illustrate it using data from the classic High School and Beyond Study.</p>","PeriodicalId":53155,"journal":{"name":"Multivariate Behavioral Research","volume":" ","pages":"584-598"},"PeriodicalIF":5.3000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multivariate Behavioral Research","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1080/00273171.2024.2307034","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/13 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

With clustered data, such as where students are nested within schools or employees are nested within organizations, it is often of interest to estimate and compare associations among variables separately for each level. While researchers routinely estimate between-cluster effects using the sample cluster means of a predictor, previous research has shown that such practice leads to biased estimates of coefficients at the between level, and recent research has recommended the use of latent cluster means with the multilevel structural equation modeling framework. However, the latent cluster mean approach may not always be the best choice as it (a) relies on the assumption that the population cluster sizes are close to infinite, (b) requires a relatively large number of clusters, and (c) is currently only implemented in specialized software such as Mplus. In this paper, we show how using empirical Bayes estimates of the cluster means can also lead to consistent estimates of between-level coefficients, and illustrate how the empirical Bayes estimate can incorporate finite population corrections when information on population cluster sizes is available. Through a series of Monte Carlo simulation studies, we show that the empirical Bayes cluster-mean approach performs similarly to the latent cluster mean approach for estimating the between-cluster coefficients in most conditions when the infinite-population assumption holds, and applying the finite population correction provides reasonable point and interval estimates when the population is finite. The performance of EBM can be further improved with restricted maximum likelihood estimation and likelihood-based confidence intervals. We also provide an R function that implements the empirical Bayes cluster-mean approach, and illustrate it using data from the classic High School and Beyond Study.

查看原文本刊更多论文

校正群集间效应的抽样误差：采用有限人口校正的经验贝叶斯聚类-均值方法》（Empirical Bayes Cluster-Mean Approach with Finite Population Corrections.

对于聚类数据，如学生嵌套在学校内或员工嵌套在组织内，通常需要分别估计和比较各层次变量之间的关联。虽然研究人员通常使用预测因子的样本聚类均值来估计聚类间效应，但以往的研究表明，这种做法会导致对聚类间系数的估计出现偏差，因此最近的研究建议在多层次结构方程建模框架下使用潜在聚类均值。然而，潜在聚类平均值方法并不总是最佳选择，因为它（a）依赖于群体聚类大小接近无限的假设，（b）需要相对较多的聚类，（c）目前只能在 Mplus 等专业软件中实现。在本文中，我们展示了如何利用对聚类均值的经验贝叶斯估计也能得出水平间系数的一致估计值，并说明了经验贝叶斯估计如何在有聚类规模信息的情况下纳入有限聚类校正。通过一系列蒙特卡罗模拟研究，我们表明，当无限人口假设成立时，经验贝叶斯聚类均值法在大多数条件下估计聚类间系数的表现与潜在聚类均值法相似，而当人口有限时，应用有限人口校正可提供合理的点和区间估计值。限制最大似然估计和基于似然的置信区间可以进一步提高 EBM 的性能。我们还提供了一个实现经验贝叶斯聚类均值方法的 R 函数，并使用经典的 "高中及高中以上研究 "中的数据进行了说明。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Multivariate Behavioral Research 数学-数学跨学科应用

CiteScore

7.60

自引率

2.60%

发文量

审稿时长

>12 weeks

期刊介绍： Multivariate Behavioral Research (MBR) publishes a variety of substantive, methodological, and theoretical articles in all areas of the social and behavioral sciences. Most MBR articles fall into one of two categories. Substantive articles report on applications of sophisticated multivariate research methods to study topics of substantive interest in personality, health, intelligence, industrial/organizational, and other behavioral science areas. Methodological articles present and/or evaluate new developments in multivariate methods, or address methodological issues in current research. We also encourage submission of integrative articles related to pedagogy involving multivariate research methods, and to historical treatments of interest and relevance to multivariate research methods.