{"title":"Document clustering using mixture model of von Mises-Fisher distributions on document manifold","authors":"N. K. Anh, Tam The Nguyen, Ngo Van Linh","doi":"10.1109/SOCPAR.2013.7054116","DOIUrl":null,"url":null,"abstract":"Document clustering has become an increasingly important technique for unsupervised document organization, automatic topic extraction, and fast information retrieval or filtering. The generative model for document clustering based on the von Mises-Fisher (vMF) distribution generally produces better clustering results than other generative models. However, in fact, it is more natural and reasonable to assume that the document space is a manifold and the probability distribution that generates the data is supported on a document manifold. In this paper, we propose a regularized probabilistic model based on manifold structure for data clustering, called Laplacian regularized vMF Mixture Model (LapvMFs), which explicitly considers the manifold structure. We have developed a generalized mean-field variational inference algorithm for the LapvMFs. Extensive experimental results on a large number of high dimensional text datasets demonstrate that our approach outperforms the three state-of-the-art clustering algorithms.","PeriodicalId":315126,"journal":{"name":"2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SOCPAR.2013.7054116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Document clustering has become an increasingly important technique for unsupervised document organization, automatic topic extraction, and fast information retrieval or filtering. The generative model for document clustering based on the von Mises-Fisher (vMF) distribution generally produces better clustering results than other generative models. However, in fact, it is more natural and reasonable to assume that the document space is a manifold and the probability distribution that generates the data is supported on a document manifold. In this paper, we propose a regularized probabilistic model based on manifold structure for data clustering, called Laplacian regularized vMF Mixture Model (LapvMFs), which explicitly considers the manifold structure. We have developed a generalized mean-field variational inference algorithm for the LapvMFs. Extensive experimental results on a large number of high dimensional text datasets demonstrate that our approach outperforms the three state-of-the-art clustering algorithms.