Sphiwe B. Skhosana, Salomon M. Millard, Frans H. J. Kanfer
{"title":"A modified EM-type algorithm to estimate semi-parametric mixtures of non-parametric regressions","authors":"Sphiwe B. Skhosana, Salomon M. Millard, Frans H. J. Kanfer","doi":"10.1007/s11222-024-10435-3","DOIUrl":null,"url":null,"abstract":"<p>Semi-parametric Gaussian mixtures of non-parametric regressions (SPGMNRs) are a flexible extension of Gaussian mixtures of linear regressions (GMLRs). The model assumes that the component regression functions (CRFs) are non-parametric functions of the covariate(s) whereas the component mixing proportions and variances are constants. Unfortunately, the model cannot be reliably estimated using traditional methods. A local-likelihood approach for estimating the CRFs requires that we maximize a set of local-likelihood functions. Using the Expectation-Maximization (EM) algorithm to separately maximize each local-likelihood function may lead to label-switching. This is because the posterior probabilities calculated at the local E-step are not guaranteed to be aligned. The consequence of this label-switching is wiggly and non-smooth estimates of the CRFs. In this paper, we propose a unified approach to address label-switching and obtain sensible estimates. The proposed approach has two stages. In the first stage, we propose a model-based approach to address the label-switching problem. We first note that each local-likelihood function is a likelihood function of a Gaussian mixture model (GMM). Next, we reformulate the SPGMNRs model as a mixture of these GMMs. Lastly, using a modified version of the Expectation Conditional Maximization (ECM) algorithm, we estimate the mixture of GMMs. In addition, using the mixing weights of the local GMMs, we can automatically choose the local points where local-likelihood estimation takes place. In the second stage, we propose one-step backfitting estimates of the parametric and non-parametric terms. The effectiveness of the proposed approach is demonstrated on simulated data and real data analysis.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics and Computing","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s11222-024-10435-3","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Semi-parametric Gaussian mixtures of non-parametric regressions (SPGMNRs) are a flexible extension of Gaussian mixtures of linear regressions (GMLRs). The model assumes that the component regression functions (CRFs) are non-parametric functions of the covariate(s) whereas the component mixing proportions and variances are constants. Unfortunately, the model cannot be reliably estimated using traditional methods. A local-likelihood approach for estimating the CRFs requires that we maximize a set of local-likelihood functions. Using the Expectation-Maximization (EM) algorithm to separately maximize each local-likelihood function may lead to label-switching. This is because the posterior probabilities calculated at the local E-step are not guaranteed to be aligned. The consequence of this label-switching is wiggly and non-smooth estimates of the CRFs. In this paper, we propose a unified approach to address label-switching and obtain sensible estimates. The proposed approach has two stages. In the first stage, we propose a model-based approach to address the label-switching problem. We first note that each local-likelihood function is a likelihood function of a Gaussian mixture model (GMM). Next, we reformulate the SPGMNRs model as a mixture of these GMMs. Lastly, using a modified version of the Expectation Conditional Maximization (ECM) algorithm, we estimate the mixture of GMMs. In addition, using the mixing weights of the local GMMs, we can automatically choose the local points where local-likelihood estimation takes place. In the second stage, we propose one-step backfitting estimates of the parametric and non-parametric terms. The effectiveness of the proposed approach is demonstrated on simulated data and real data analysis.
期刊介绍:
Statistics and Computing is a bi-monthly refereed journal which publishes papers covering the range of the interface between the statistical and computing sciences.
In particular, it addresses the use of statistical concepts in computing science, for example in machine learning, computer vision and data analytics, as well as the use of computers in data modelling, prediction and analysis. Specific topics which are covered include: techniques for evaluating analytically intractable problems such as bootstrap resampling, Markov chain Monte Carlo, sequential Monte Carlo, approximate Bayesian computation, search and optimization methods, stochastic simulation and Monte Carlo, graphics, computer environments, statistical approaches to software errors, information retrieval, machine learning, statistics of databases and database technology, huge data sets and big data analytics, computer algebra, graphical models, image processing, tomography, inverse problems and uncertainty quantification.
In addition, the journal contains original research reports, authoritative review papers, discussed papers, and occasional special issues on particular topics or carrying proceedings of relevant conferences. Statistics and Computing also publishes book review and software review sections.