Semantic search using a similarity graph

L. Stanchev
{"title":"Semantic search using a similarity graph","authors":"L. Stanchev","doi":"10.1109/ICOSC.2015.7050785","DOIUrl":null,"url":null,"abstract":"Given a set of documents and an input query that is expressed in a natural language, the problem of document search is retrieving the most relevant documents. Unlike most existing systems that perform document search based on keywords matching, we propose a search method that considers the meaning of the words in the query and the document. As a result, our algorithm can return documents that have no words in common with the input query as long as the documents are relevant. For example, a document that contains the words “Ford”, “Chrysler” and “General Motors” multiple times is surely relevant for the query “car” even if the word “car” does not appear in the document. Our semantic search algorithm is based on a similarity graph that contains the degree of semantic similarity between terms, where a term can be a word or a phrase. We experimentally validate our algorithm on the Cranfield benchmark that contains 1400 documents and 225 natural language queries. The benchmark also contains the relevant documents for every query as determined by human judgment. We show that our semantic search algorithm produces a higher value for the mean average precision (MAP) score than a keywords matching algorithm. This shows that our approach can improve the quality of the result because the meaning of the words and phrases in the documents and the queries is taken into account.","PeriodicalId":126701,"journal":{"name":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOSC.2015.7050785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Given a set of documents and an input query that is expressed in a natural language, the problem of document search is retrieving the most relevant documents. Unlike most existing systems that perform document search based on keywords matching, we propose a search method that considers the meaning of the words in the query and the document. As a result, our algorithm can return documents that have no words in common with the input query as long as the documents are relevant. For example, a document that contains the words “Ford”, “Chrysler” and “General Motors” multiple times is surely relevant for the query “car” even if the word “car” does not appear in the document. Our semantic search algorithm is based on a similarity graph that contains the degree of semantic similarity between terms, where a term can be a word or a phrase. We experimentally validate our algorithm on the Cranfield benchmark that contains 1400 documents and 225 natural language queries. The benchmark also contains the relevant documents for every query as determined by human judgment. We show that our semantic search algorithm produces a higher value for the mean average precision (MAP) score than a keywords matching algorithm. This shows that our approach can improve the quality of the result because the meaning of the words and phrases in the documents and the queries is taken into account.
使用相似图的语义搜索
给定一组文档和一个用自然语言表示的输入查询,文档搜索的问题是检索最相关的文档。与大多数基于关键字匹配执行文档搜索的现有系统不同,我们提出了一种考虑查询和文档中单词含义的搜索方法。因此,只要文档是相关的,我们的算法就可以返回与输入查询没有共同单词的文档。例如,一个多次包含“Ford”、“Chrysler”和“General Motors”的文档肯定与查询“car”相关,即使“car”这个词没有出现在文档中。我们的语义搜索算法基于包含词之间语义相似度的相似度图,其中一个词可以是一个词或一个短语。我们在包含1400个文档和225个自然语言查询的Cranfield基准上实验验证了我们的算法。基准测试还包含由人类判断确定的每个查询的相关文档。我们表明,我们的语义搜索算法比关键词匹配算法产生更高的平均精度(MAP)分数。这表明我们的方法可以提高结果的质量,因为考虑了文档和查询中的单词和短语的含义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信