{"title":"Big topic modeling based on a two-level hierarchical latent Beta-Liouville allocation for large-scale data and parameter streaming","authors":"Koffi Eddy Ihou, Nizar Bouguila","doi":"10.1007/s10044-024-01213-y","DOIUrl":null,"url":null,"abstract":"<p>As an extension to the standard symmetric latent Dirichlet allocation topic model, we implement asymmetric Beta-Liouville as a conjugate prior to the multinomial and therefore propose the maximum a posteriori for latent Beta-Liouville allocation as an alternative to maximum likelihood estimator for models such as probabilistic latent semantic indexing, unigrams, and mixture of unigrams. Since most Bayesian posteriors, for complex models, are intractable in general, we propose a point estimate (the mode) that offers a much tractable solution. The maximum a posteriori hypotheses using point estimates are much easier than full Bayesian analysis that integrates over the entire parameter space. We show that the proposed maximum a posteriori reduces the three-level hierarchical latent Beta-Liouville allocation to two-level topic mixture as we marginalize out the latent variables. In each document, the maximum a posteriori provides a soft assignment and constructs dense expectation–maximization probabilities over each word (responsibilities) for accurate estimates. For simplicity, we present a stochastic at word-level online expectation–maximization algorithm as an optimization method for maximum a posteriori latent Beta-Liouville allocation estimation whose unnormalized reparameterization is equivalent to a stochastic collapsed variational Bayes. This implicit connection between the collapsed space and expectation–maximization-based maximum a posteriori latent Beta-Liouville allocation shows its flexibility and helps in providing alternative to model selection. We characterize efficiency in the proposed approach for its ability to simultaneously stream both large-scale data and parameters seamlessly. The performance of the model using predictive perplexities as evaluation method shows the robustness of the proposed technique with text document datasets.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"80 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Analysis and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10044-024-01213-y","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
As an extension to the standard symmetric latent Dirichlet allocation topic model, we implement asymmetric Beta-Liouville as a conjugate prior to the multinomial and therefore propose the maximum a posteriori for latent Beta-Liouville allocation as an alternative to maximum likelihood estimator for models such as probabilistic latent semantic indexing, unigrams, and mixture of unigrams. Since most Bayesian posteriors, for complex models, are intractable in general, we propose a point estimate (the mode) that offers a much tractable solution. The maximum a posteriori hypotheses using point estimates are much easier than full Bayesian analysis that integrates over the entire parameter space. We show that the proposed maximum a posteriori reduces the three-level hierarchical latent Beta-Liouville allocation to two-level topic mixture as we marginalize out the latent variables. In each document, the maximum a posteriori provides a soft assignment and constructs dense expectation–maximization probabilities over each word (responsibilities) for accurate estimates. For simplicity, we present a stochastic at word-level online expectation–maximization algorithm as an optimization method for maximum a posteriori latent Beta-Liouville allocation estimation whose unnormalized reparameterization is equivalent to a stochastic collapsed variational Bayes. This implicit connection between the collapsed space and expectation–maximization-based maximum a posteriori latent Beta-Liouville allocation shows its flexibility and helps in providing alternative to model selection. We characterize efficiency in the proposed approach for its ability to simultaneously stream both large-scale data and parameters seamlessly. The performance of the model using predictive perplexities as evaluation method shows the robustness of the proposed technique with text document datasets.
期刊介绍:
The journal publishes high quality articles in areas of fundamental research in intelligent pattern analysis and applications in computer science and engineering. It aims to provide a forum for original research which describes novel pattern analysis techniques and industrial applications of the current technology. In addition, the journal will also publish articles on pattern analysis applications in medical imaging. The journal solicits articles that detail new technology and methods for pattern recognition and analysis in applied domains including, but not limited to, computer vision and image processing, speech analysis, robotics, multimedia, document analysis, character recognition, knowledge engineering for pattern recognition, fractal analysis, and intelligent control. The journal publishes articles on the use of advanced pattern recognition and analysis methods including statistical techniques, neural networks, genetic algorithms, fuzzy pattern recognition, machine learning, and hardware implementations which are either relevant to the development of pattern analysis as a research area or detail novel pattern analysis applications. Papers proposing new classifier systems or their development, pattern analysis systems for real-time applications, fuzzy and temporal pattern recognition and uncertainty management in applied pattern recognition are particularly solicited.