{"title":"Mongolian Information Retrieval Method Based on Word2vec and Topic Model","authors":"Siriguleng","doi":"10.1109/IAEAC47372.2019.8997588","DOIUrl":null,"url":null,"abstract":"To capture the real intention of users’ needs more accurately from the increasingly abundant Mongolian information and return the retrieval results that best meet their needs, a Mongolian information retrieval method based on Word2vec and LDA topic model is proposed in this paper. Combining Mongolian grammatical features, this method builds a model based on LDA three-tier Bayesian structure to mine the hidden topic distribution and feature word distribution of documents, expands user queries according to Word2vec model to obtain words similar to user query keywords semantically, and then uses topic model to model extended vocabulary. Finally, according to the user’s query topic, the similarity between the query topic and the document topic is calculated, and the document with high relevance to the query topic is returned. The experimental results show that the effective combination of Word2vec and LDA model achieves better results than the traditional model with initial query in the representation of latent semantics.","PeriodicalId":164163,"journal":{"name":"2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IAEAC47372.2019.8997588","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
To capture the real intention of users’ needs more accurately from the increasingly abundant Mongolian information and return the retrieval results that best meet their needs, a Mongolian information retrieval method based on Word2vec and LDA topic model is proposed in this paper. Combining Mongolian grammatical features, this method builds a model based on LDA three-tier Bayesian structure to mine the hidden topic distribution and feature word distribution of documents, expands user queries according to Word2vec model to obtain words similar to user query keywords semantically, and then uses topic model to model extended vocabulary. Finally, according to the user’s query topic, the similarity between the query topic and the document topic is calculated, and the document with high relevance to the query topic is returned. The experimental results show that the effective combination of Word2vec and LDA model achieves better results than the traditional model with initial query in the representation of latent semantics.