A Maximum Likelihood Method for High-Dimensional Structural Equation Modeling.

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine Pub Date : 2025-07-01 DOI:10.1002/sim.70171

Alexander Quinter, Xianming Tan, Donglin Zeng, Joseph G Ibrahim

{"title":"A Maximum Likelihood Method for High-Dimensional Structural Equation Modeling.","authors":"Alexander Quinter, Xianming Tan, Donglin Zeng, Joseph G Ibrahim","doi":"10.1002/sim.70171","DOIUrl":null,"url":null,"abstract":"<p><p>Factor analysis provides an intuitive approach for dimension reduction when working with big data, allowing researchers to represent an extensive number of correlated variables via a subset of underlying latent factors. Traditional methods of factor analysis, such as Structural Equation Modeling (SEM) and factor regression, lack properties desirable for analyzing big data, such as the ability to handle high-dimensionality or the ability to enforce sparsity on the estimates of the factor loading matrices. These methods also assume that the number of latent constructs is known beforehand, a problem unique to factor analysis that often goes unaddressed or overlooked, with ad hoc methods being the most common ways to deal with such a fundamental question. Although recent developments in the literature have attempted to remedy these issues, particularly with regard to expanding SEM to high-dimensional and sparse applications, there is a noticeable lack of such methods that do so using likelihood theory. To rectify this shortcoming, we propose a new SEM-based method for estimation that utilizes maximum likelihood theory while simultaneously addressing some of the most common problems associated with big data. We substantiate our method through simulation studies, indicating that the proposed method can correctly identify the latent factors underlying the independent and dependent sets of variables, while also accurately estimating the entries of and enforcing sparsity upon the factor loading matrix estimates. We apply this method to the COVIDiSTRESS Global Survey dataset, a global survey collected to further our understanding of how the COVID-19 pandemic affected the human experience. Doing so demonstrates the performance of the model while simultaneously identifying the latent constructs intrinsic to the data.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 15-17","pages":"e70171"},"PeriodicalIF":1.8000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.70171","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Factor analysis provides an intuitive approach for dimension reduction when working with big data, allowing researchers to represent an extensive number of correlated variables via a subset of underlying latent factors. Traditional methods of factor analysis, such as Structural Equation Modeling (SEM) and factor regression, lack properties desirable for analyzing big data, such as the ability to handle high-dimensionality or the ability to enforce sparsity on the estimates of the factor loading matrices. These methods also assume that the number of latent constructs is known beforehand, a problem unique to factor analysis that often goes unaddressed or overlooked, with ad hoc methods being the most common ways to deal with such a fundamental question. Although recent developments in the literature have attempted to remedy these issues, particularly with regard to expanding SEM to high-dimensional and sparse applications, there is a noticeable lack of such methods that do so using likelihood theory. To rectify this shortcoming, we propose a new SEM-based method for estimation that utilizes maximum likelihood theory while simultaneously addressing some of the most common problems associated with big data. We substantiate our method through simulation studies, indicating that the proposed method can correctly identify the latent factors underlying the independent and dependent sets of variables, while also accurately estimating the entries of and enforcing sparsity upon the factor loading matrix estimates. We apply this method to the COVIDiSTRESS Global Survey dataset, a global survey collected to further our understanding of how the COVID-19 pandemic affected the human experience. Doing so demonstrates the performance of the model while simultaneously identifying the latent constructs intrinsic to the data.

查看原文本刊更多论文

高维结构方程建模的极大似然法。

因子分析为处理大数据提供了一种直观的降维方法，使研究人员能够通过潜在因素的子集来表示大量相关变量。传统的因子分析方法，如结构方程建模（SEM）和因子回归，缺乏分析大数据所需的特性，例如处理高维的能力或在因子加载矩阵的估计上强制执行稀疏性的能力。这些方法还假设潜在构念的数量是事先已知的，这是因子分析所特有的问题，经常得不到解决或被忽视，而特别方法是处理这种基本问题的最常见方法。尽管最近文献的发展已经试图解决这些问题，特别是将扫描电镜扩展到高维和稀疏应用方面，但明显缺乏使用似然理论来解决这些问题的方法。为了纠正这一缺点，我们提出了一种新的基于sem的估计方法，该方法利用最大似然理论，同时解决了与大数据相关的一些最常见的问题。我们通过仿真研究证实了我们的方法，表明所提出的方法可以正确地识别自变量集和因变量集背后的潜在因素，同时也可以准确地估计因子加载矩阵的条目并对其进行稀疏性估计。我们将这种方法应用于COVIDiSTRESS全球调查数据集，该数据集是一项全球调查，旨在进一步了解COVID-19大流行如何影响人类体验。这样做可以展示模型的性能，同时识别数据固有的潜在构造。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistics in Medicine 医学-公共卫生、环境卫生与职业卫生

CiteScore

3.40

自引率

10.00%

发文量

334

审稿时长

2-4 weeks

期刊介绍： The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.