Alexander Quinter, Xianming Tan, Donglin Zeng, Joseph G Ibrahim
{"title":"A Maximum Likelihood Method for High-Dimensional Structural Equation Modeling.","authors":"Alexander Quinter, Xianming Tan, Donglin Zeng, Joseph G Ibrahim","doi":"10.1002/sim.70171","DOIUrl":null,"url":null,"abstract":"<p><p>Factor analysis provides an intuitive approach for dimension reduction when working with big data, allowing researchers to represent an extensive number of correlated variables via a subset of underlying latent factors. Traditional methods of factor analysis, such as Structural Equation Modeling (SEM) and factor regression, lack properties desirable for analyzing big data, such as the ability to handle high-dimensionality or the ability to enforce sparsity on the estimates of the factor loading matrices. These methods also assume that the number of latent constructs is known beforehand, a problem unique to factor analysis that often goes unaddressed or overlooked, with ad hoc methods being the most common ways to deal with such a fundamental question. Although recent developments in the literature have attempted to remedy these issues, particularly with regard to expanding SEM to high-dimensional and sparse applications, there is a noticeable lack of such methods that do so using likelihood theory. To rectify this shortcoming, we propose a new SEM-based method for estimation that utilizes maximum likelihood theory while simultaneously addressing some of the most common problems associated with big data. We substantiate our method through simulation studies, indicating that the proposed method can correctly identify the latent factors underlying the independent and dependent sets of variables, while also accurately estimating the entries of and enforcing sparsity upon the factor loading matrix estimates. We apply this method to the COVIDiSTRESS Global Survey dataset, a global survey collected to further our understanding of how the COVID-19 pandemic affected the human experience. Doing so demonstrates the performance of the model while simultaneously identifying the latent constructs intrinsic to the data.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 15-17","pages":"e70171"},"PeriodicalIF":1.8000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.70171","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Factor analysis provides an intuitive approach for dimension reduction when working with big data, allowing researchers to represent an extensive number of correlated variables via a subset of underlying latent factors. Traditional methods of factor analysis, such as Structural Equation Modeling (SEM) and factor regression, lack properties desirable for analyzing big data, such as the ability to handle high-dimensionality or the ability to enforce sparsity on the estimates of the factor loading matrices. These methods also assume that the number of latent constructs is known beforehand, a problem unique to factor analysis that often goes unaddressed or overlooked, with ad hoc methods being the most common ways to deal with such a fundamental question. Although recent developments in the literature have attempted to remedy these issues, particularly with regard to expanding SEM to high-dimensional and sparse applications, there is a noticeable lack of such methods that do so using likelihood theory. To rectify this shortcoming, we propose a new SEM-based method for estimation that utilizes maximum likelihood theory while simultaneously addressing some of the most common problems associated with big data. We substantiate our method through simulation studies, indicating that the proposed method can correctly identify the latent factors underlying the independent and dependent sets of variables, while also accurately estimating the entries of and enforcing sparsity upon the factor loading matrix estimates. We apply this method to the COVIDiSTRESS Global Survey dataset, a global survey collected to further our understanding of how the COVID-19 pandemic affected the human experience. Doing so demonstrates the performance of the model while simultaneously identifying the latent constructs intrinsic to the data.
期刊介绍:
The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.