基于LDA模型的蒙文信息检索方法

2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS) Pub Date : 2015-11-30 DOI:10.1109/ICSESS.2015.7339073

Min Lin Siriguleng, Changbo Tian

{"title":"基于LDA模型的蒙文信息检索方法","authors":"Min Lin Siriguleng, Changbo Tian","doi":"10.1109/ICSESS.2015.7339073","DOIUrl":null,"url":null,"abstract":"A new method based on Latent Dirichlet Allocation (LDA) is proposed to retrieval information in Mongolian. Semantic information is also considered by Mongolian documents when consider relationship between keywords and retrieval documents. This method models Mongolian documents with LDA, parameters are estimated with Gibbs sampling and probability of word is represented, it can mine the hidden relationship between the different topics and the words from documents, get the topic distribution and compute the similarity of keywords topics. Finally, return to the most relevant documents with topics. Experimental results show that the method has a higher performance in topic semantic compared with vector space model and Language model.","PeriodicalId":335871,"journal":{"name":"2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Mongolian information retrieval method based on LDA model\",\"authors\":\"Min Lin Siriguleng, Changbo Tian\",\"doi\":\"10.1109/ICSESS.2015.7339073\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A new method based on Latent Dirichlet Allocation (LDA) is proposed to retrieval information in Mongolian. Semantic information is also considered by Mongolian documents when consider relationship between keywords and retrieval documents. This method models Mongolian documents with LDA, parameters are estimated with Gibbs sampling and probability of word is represented, it can mine the hidden relationship between the different topics and the words from documents, get the topic distribution and compute the similarity of keywords topics. Finally, return to the most relevant documents with topics. Experimental results show that the method has a higher performance in topic semantic compared with vector space model and Language model.\",\"PeriodicalId\":335871,\"journal\":{\"name\":\"2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS)\",\"volume\":\"140 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSESS.2015.7339073\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSESS.2015.7339073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

提出了一种基于潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)的蒙古语信息检索方法。蒙古语文档在考虑关键词与检索文档之间的关系时，也考虑了语义信息。该方法利用LDA对蒙古语文档进行建模，利用Gibbs抽样估计参数并表示单词的概率，挖掘不同主题与文档中单词之间的隐藏关系，得到主题分布，计算关键词主题的相似度。最后，用主题返回到最相关的文档。实验结果表明，与向量空间模型和语言模型相比，该方法在主题语义方面具有更高的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Mongolian information retrieval method based on LDA model

A new method based on Latent Dirichlet Allocation (LDA) is proposed to retrieval information in Mongolian. Semantic information is also considered by Mongolian documents when consider relationship between keywords and retrieval documents. This method models Mongolian documents with LDA, parameters are estimated with Gibbs sampling and probability of word is represented, it can mine the hidden relationship between the different topics and the words from documents, get the topic distribution and compute the similarity of keywords topics. Finally, return to the most relevant documents with topics. Experimental results show that the method has a higher performance in topic semantic compared with vector space model and Language model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS)

自引率

0.00%

发文量