{"title":"Learning High-dimensional Latent Variable Models via Doubly Stochastic Optimisation by Unadjusted Langevin","authors":"Motonori Oka, Yunxiao Chen, Irini Mounstaki","doi":"arxiv-2406.09311","DOIUrl":null,"url":null,"abstract":"Latent variable models are widely used in social and behavioural sciences,\nsuch as education, psychology, and political science. In recent years,\nhigh-dimensional latent variable models have become increasingly common for\nanalysing large and complex data. Estimating high-dimensional latent variable\nmodels using marginal maximum likelihood is computationally demanding due to\nthe complexity of integrals involved. To address this challenge, stochastic\noptimisation, which combines stochastic approximation and sampling techniques,\nhas been shown to be effective. This method iterates between two steps -- (1)\nsampling the latent variables from their posterior distribution based on the\ncurrent parameter estimate, and (2) updating the fixed parameters using an\napproximate stochastic gradient constructed from the latent variable samples.\nIn this paper, we propose a computationally more efficient stochastic\noptimisation algorithm. This improvement is achieved through the use of a\nminibatch of observations when sampling latent variables and constructing\nstochastic gradients, and an unadjusted Langevin sampler that utilises the\ngradient of the negative complete-data log-likelihood to sample latent\nvariables. Theoretical results are established for the proposed algorithm,\nshowing that the iterative parameter update converges to the marginal maximum\nlikelihood estimate as the number of iterations goes to infinity. Furthermore,\nthe proposed algorithm is shown to scale well to high-dimensional settings\nthrough simulation studies and a personality test application with 30,000\nrespondents, 300 items, and 30 latent dimensions.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.09311","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Latent variable models are widely used in social and behavioural sciences,
such as education, psychology, and political science. In recent years,
high-dimensional latent variable models have become increasingly common for
analysing large and complex data. Estimating high-dimensional latent variable
models using marginal maximum likelihood is computationally demanding due to
the complexity of integrals involved. To address this challenge, stochastic
optimisation, which combines stochastic approximation and sampling techniques,
has been shown to be effective. This method iterates between two steps -- (1)
sampling the latent variables from their posterior distribution based on the
current parameter estimate, and (2) updating the fixed parameters using an
approximate stochastic gradient constructed from the latent variable samples.
In this paper, we propose a computationally more efficient stochastic
optimisation algorithm. This improvement is achieved through the use of a
minibatch of observations when sampling latent variables and constructing
stochastic gradients, and an unadjusted Langevin sampler that utilises the
gradient of the negative complete-data log-likelihood to sample latent
variables. Theoretical results are established for the proposed algorithm,
showing that the iterative parameter update converges to the marginal maximum
likelihood estimate as the number of iterations goes to infinity. Furthermore,
the proposed algorithm is shown to scale well to high-dimensional settings
through simulation studies and a personality test application with 30,000
respondents, 300 items, and 30 latent dimensions.