On the Synonym Search Model

О.М. Атаева, Владимир Алексеевич Серебряков, Н.П. Тучкова
{"title":"On the Synonym Search Model","authors":"О.М. Атаева, Владимир Алексеевич Серебряков, Н.П. Тучкова","doi":"10.26907/1562-5419-2021-24-6-1006-1022","DOIUrl":null,"url":null,"abstract":"The problem of finding the most relevant documents as a result of an extended and refined query is considered. For this, a search model and a text preprocessing mechanism are proposed, as well as the joint use of a search engine and a neural network model built on the basis of an index using word2vec algorithms to generate an extended query with synonyms and refine search results based on a selection of similar documents in a digital semantic library. The paper investigates the construction of a vector representation of documents based on paragraphs in relation to the data array of the digital semantic library LibMeta. Each piece of text is labeled. Both the whole document and its separate parts can be marked. The problem of enriching user queries with synonyms was solved, then when building a search model together with word2vec algorithms, an approach of \"indexing first, then training\" was used to cover more information and give more accurate search results. The model was trained on the basis of the library's mathematical content. Examples of training, extended query and search quality assessment using training and synonyms are given.","PeriodicalId":262909,"journal":{"name":"Russian Digital Libraries Journal","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Russian Digital Libraries Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26907/1562-5419-2021-24-6-1006-1022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The problem of finding the most relevant documents as a result of an extended and refined query is considered. For this, a search model and a text preprocessing mechanism are proposed, as well as the joint use of a search engine and a neural network model built on the basis of an index using word2vec algorithms to generate an extended query with synonyms and refine search results based on a selection of similar documents in a digital semantic library. The paper investigates the construction of a vector representation of documents based on paragraphs in relation to the data array of the digital semantic library LibMeta. Each piece of text is labeled. Both the whole document and its separate parts can be marked. The problem of enriching user queries with synonyms was solved, then when building a search model together with word2vec algorithms, an approach of "indexing first, then training" was used to cover more information and give more accurate search results. The model was trained on the basis of the library's mathematical content. Examples of training, extended query and search quality assessment using training and synonyms are given.
关于同义词搜索模型
考虑了通过扩展和细化查询找到最相关文档的问题。为此,提出了一种搜索模型和文本预处理机制,并联合使用搜索引擎和基于word2vec算法建立索引的神经网络模型,生成带有同义词的扩展查询,并根据数字语义库中相似文档的选择对搜索结果进行细化。本文针对数字语义库LibMeta的数据数组,研究了基于段落的文档向量表示的构建。每一段文字都有标签。整个文档和单独的部分都可以标记。解决了用同义词丰富用户查询的问题,在结合word2vec算法构建搜索模型时,采用“先索引,再训练”的方法,覆盖更多的信息,给出更准确的搜索结果。该模型是根据图书馆的数学内容进行训练的。给出了使用训练和同义词进行训练、扩展查询和搜索质量评估的示例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信