文档流形上von Mises-Fisher分布混合模型的文档聚类

N. K. Anh, Tam The Nguyen, Ngo Van Linh
{"title":"文档流形上von Mises-Fisher分布混合模型的文档聚类","authors":"N. K. Anh, Tam The Nguyen, Ngo Van Linh","doi":"10.1109/SOCPAR.2013.7054116","DOIUrl":null,"url":null,"abstract":"Document clustering has become an increasingly important technique for unsupervised document organization, automatic topic extraction, and fast information retrieval or filtering. The generative model for document clustering based on the von Mises-Fisher (vMF) distribution generally produces better clustering results than other generative models. However, in fact, it is more natural and reasonable to assume that the document space is a manifold and the probability distribution that generates the data is supported on a document manifold. In this paper, we propose a regularized probabilistic model based on manifold structure for data clustering, called Laplacian regularized vMF Mixture Model (LapvMFs), which explicitly considers the manifold structure. We have developed a generalized mean-field variational inference algorithm for the LapvMFs. Extensive experimental results on a large number of high dimensional text datasets demonstrate that our approach outperforms the three state-of-the-art clustering algorithms.","PeriodicalId":315126,"journal":{"name":"2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Document clustering using mixture model of von Mises-Fisher distributions on document manifold\",\"authors\":\"N. K. Anh, Tam The Nguyen, Ngo Van Linh\",\"doi\":\"10.1109/SOCPAR.2013.7054116\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Document clustering has become an increasingly important technique for unsupervised document organization, automatic topic extraction, and fast information retrieval or filtering. The generative model for document clustering based on the von Mises-Fisher (vMF) distribution generally produces better clustering results than other generative models. However, in fact, it is more natural and reasonable to assume that the document space is a manifold and the probability distribution that generates the data is supported on a document manifold. In this paper, we propose a regularized probabilistic model based on manifold structure for data clustering, called Laplacian regularized vMF Mixture Model (LapvMFs), which explicitly considers the manifold structure. We have developed a generalized mean-field variational inference algorithm for the LapvMFs. Extensive experimental results on a large number of high dimensional text datasets demonstrate that our approach outperforms the three state-of-the-art clustering algorithms.\",\"PeriodicalId\":315126,\"journal\":{\"name\":\"2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SOCPAR.2013.7054116\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SOCPAR.2013.7054116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

文档聚类已成为无监督文档组织、自动主题提取和快速信息检索或过滤的重要技术。基于von Mises-Fisher (vMF)分布的文档聚类生成模型通常比其他生成模型具有更好的聚类效果。然而,实际上,更自然和合理的假设是文档空间是一个流形,并且生成数据的概率分布在文档流形上得到支持。本文提出了一种基于流形结构的数据聚类正则化概率模型,称为拉普拉斯正则化vMF混合模型(lapvmf),该模型明确考虑了流形结构。我们为lapvmf开发了一种广义的平均场变分推理算法。在大量高维文本数据集上的大量实验结果表明,我们的方法优于三种最先进的聚类算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Document clustering using mixture model of von Mises-Fisher distributions on document manifold
Document clustering has become an increasingly important technique for unsupervised document organization, automatic topic extraction, and fast information retrieval or filtering. The generative model for document clustering based on the von Mises-Fisher (vMF) distribution generally produces better clustering results than other generative models. However, in fact, it is more natural and reasonable to assume that the document space is a manifold and the probability distribution that generates the data is supported on a document manifold. In this paper, we propose a regularized probabilistic model based on manifold structure for data clustering, called Laplacian regularized vMF Mixture Model (LapvMFs), which explicitly considers the manifold structure. We have developed a generalized mean-field variational inference algorithm for the LapvMFs. Extensive experimental results on a large number of high dimensional text datasets demonstrate that our approach outperforms the three state-of-the-art clustering algorithms.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信