{"title":"Multi‐node Expectation–Maximization algorithm for finite mixture models","authors":"Sharon X. Lee, G. McLachlan, Kaleb L. Leemaqz","doi":"10.1002/sam.11529","DOIUrl":null,"url":null,"abstract":"Finite mixture models are powerful tools for modeling and analyzing heterogeneous data. Parameter estimation is typically carried out using maximum likelihood estimation via the Expectation–Maximization (EM) algorithm. Recently, the adoption of flexible distributions as component densities has become increasingly popular. Often, the EM algorithm for these models involves complicated expressions that are time‐consuming to evaluate numerically. In this paper, we describe a parallel implementation of the EM algorithm suitable for both single‐threaded and multi‐threaded processors and for both single machine and multiple‐node systems. Numerical experiments are performed to demonstrate the potential performance gain in different settings. Comparison is also made across two commonly used platforms—R and MATLAB. For illustration, a fairly general mixture model is used in the comparison.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining: The ASA Data Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/sam.11529","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Finite mixture models are powerful tools for modeling and analyzing heterogeneous data. Parameter estimation is typically carried out using maximum likelihood estimation via the Expectation–Maximization (EM) algorithm. Recently, the adoption of flexible distributions as component densities has become increasingly popular. Often, the EM algorithm for these models involves complicated expressions that are time‐consuming to evaluate numerically. In this paper, we describe a parallel implementation of the EM algorithm suitable for both single‐threaded and multi‐threaded processors and for both single machine and multiple‐node systems. Numerical experiments are performed to demonstrate the potential performance gain in different settings. Comparison is also made across two commonly used platforms—R and MATLAB. For illustration, a fairly general mixture model is used in the comparison.