基于向量模型的词嵌入信息检索系统

2022 10th International Conference on Emerging Trends in Engineering and Technology - Signal and Information Processing (ICETET-SIP-22) Pub Date : 2022-04-29 DOI:10.1109/ICETET-SIP-2254415.2022.9791503

J. Brundha, K. Meera

{"title":"基于向量模型的词嵌入信息检索系统","authors":"J. Brundha, K. Meera","doi":"10.1109/ICETET-SIP-2254415.2022.9791503","DOIUrl":null,"url":null,"abstract":"Vector based information retrieval system has been one of the trending methods in Natural Language Processing. The embeddings vector generated from a document helps in identifying most relevant document related to the query. There is various approach were embedding vectors can be generated and some of them which have implemented are Word2vec, Glove2vec and Sentence BERT. For information retrieval system also used word embedding transformation like PCA and Factor Analysis to improvise the model's performance. Most of information retrieval system involves getting query from the user, preprocessing of the query and generating most relevant information to the query. Results obtained by post processing methods such as PCA and Factor Analysis shows a comparatively better results with an increase of 2–3% of Mean average precision.","PeriodicalId":117229,"journal":{"name":"2022 10th International Conference on Emerging Trends in Engineering and Technology - Signal and Information Processing (ICETET-SIP-22)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Vector Model Based Information Retrieval System With Word Embedding Transformation\",\"authors\":\"J. Brundha, K. Meera\",\"doi\":\"10.1109/ICETET-SIP-2254415.2022.9791503\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Vector based information retrieval system has been one of the trending methods in Natural Language Processing. The embeddings vector generated from a document helps in identifying most relevant document related to the query. There is various approach were embedding vectors can be generated and some of them which have implemented are Word2vec, Glove2vec and Sentence BERT. For information retrieval system also used word embedding transformation like PCA and Factor Analysis to improvise the model's performance. Most of information retrieval system involves getting query from the user, preprocessing of the query and generating most relevant information to the query. Results obtained by post processing methods such as PCA and Factor Analysis shows a comparatively better results with an increase of 2–3% of Mean average precision.\",\"PeriodicalId\":117229,\"journal\":{\"name\":\"2022 10th International Conference on Emerging Trends in Engineering and Technology - Signal and Information Processing (ICETET-SIP-22)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 10th International Conference on Emerging Trends in Engineering and Technology - Signal and Information Processing (ICETET-SIP-22)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICETET-SIP-2254415.2022.9791503\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 10th International Conference on Emerging Trends in Engineering and Technology - Signal and Information Processing (ICETET-SIP-22)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICETET-SIP-2254415.2022.9791503","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

基于向量的信息检索系统已成为自然语言处理领域的发展趋势之一。从文档生成的嵌入向量有助于识别与查询相关的最相关文档。有多种方法可以生成嵌入向量，其中一些已经实现的是Word2vec, Glove2vec和Sentence BERT。对于信息检索系统，还采用了PCA和因子分析等词嵌入变换来改进模型的性能。大多数信息检索系统都涉及到从用户处获取查询、对查询进行预处理和生成与查询最相关的信息。通过主成分分析和因子分析等后处理方法获得了较好的结果，平均精度提高了2-3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Vector Model Based Information Retrieval System With Word Embedding Transformation

Vector based information retrieval system has been one of the trending methods in Natural Language Processing. The embeddings vector generated from a document helps in identifying most relevant document related to the query. There is various approach were embedding vectors can be generated and some of them which have implemented are Word2vec, Glove2vec and Sentence BERT. For information retrieval system also used word embedding transformation like PCA and Factor Analysis to improvise the model's performance. Most of information retrieval system involves getting query from the user, preprocessing of the query and generating most relevant information to the query. Results obtained by post processing methods such as PCA and Factor Analysis shows a comparatively better results with an increase of 2–3% of Mean average precision.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 10th International Conference on Emerging Trends in Engineering and Technology - Signal and Information Processing (ICETET-SIP-22)

自引率

0.00%

发文量