Semantic search using a similarity graph

Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015) Pub Date : 2015-02-01 DOI:10.1109/ICOSC.2015.7050785

L. Stanchev

{"title":"Semantic search using a similarity graph","authors":"L. Stanchev","doi":"10.1109/ICOSC.2015.7050785","DOIUrl":null,"url":null,"abstract":"Given a set of documents and an input query that is expressed in a natural language, the problem of document search is retrieving the most relevant documents. Unlike most existing systems that perform document search based on keywords matching, we propose a search method that considers the meaning of the words in the query and the document. As a result, our algorithm can return documents that have no words in common with the input query as long as the documents are relevant. For example, a document that contains the words “Ford”, “Chrysler” and “General Motors” multiple times is surely relevant for the query “car” even if the word “car” does not appear in the document. Our semantic search algorithm is based on a similarity graph that contains the degree of semantic similarity between terms, where a term can be a word or a phrase. We experimentally validate our algorithm on the Cranfield benchmark that contains 1400 documents and 225 natural language queries. The benchmark also contains the relevant documents for every query as determined by human judgment. We show that our semantic search algorithm produces a higher value for the mean average precision (MAP) score than a keywords matching algorithm. This shows that our approach can improve the quality of the result because the meaning of the words and phrases in the documents and the queries is taken into account.","PeriodicalId":126701,"journal":{"name":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOSC.2015.7050785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Given a set of documents and an input query that is expressed in a natural language, the problem of document search is retrieving the most relevant documents. Unlike most existing systems that perform document search based on keywords matching, we propose a search method that considers the meaning of the words in the query and the document. As a result, our algorithm can return documents that have no words in common with the input query as long as the documents are relevant. For example, a document that contains the words “Ford”, “Chrysler” and “General Motors” multiple times is surely relevant for the query “car” even if the word “car” does not appear in the document. Our semantic search algorithm is based on a similarity graph that contains the degree of semantic similarity between terms, where a term can be a word or a phrase. We experimentally validate our algorithm on the Cranfield benchmark that contains 1400 documents and 225 natural language queries. The benchmark also contains the relevant documents for every query as determined by human judgment. We show that our semantic search algorithm produces a higher value for the mean average precision (MAP) score than a keywords matching algorithm. This shows that our approach can improve the quality of the result because the meaning of the words and phrases in the documents and the queries is taken into account.

查看原文本刊更多论文

使用相似图的语义搜索

给定一组文档和一个用自然语言表示的输入查询，文档搜索的问题是检索最相关的文档。与大多数基于关键字匹配执行文档搜索的现有系统不同，我们提出了一种考虑查询和文档中单词含义的搜索方法。因此，只要文档是相关的，我们的算法就可以返回与输入查询没有共同单词的文档。例如，一个多次包含“Ford”、“Chrysler”和“General Motors”的文档肯定与查询“car”相关，即使“car”这个词没有出现在文档中。我们的语义搜索算法基于包含词之间语义相似度的相似度图，其中一个词可以是一个词或一个短语。我们在包含1400个文档和225个自然语言查询的Cranfield基准上实验验证了我们的算法。基准测试还包含由人类判断确定的每个查询的相关文档。我们表明，我们的语义搜索算法比关键词匹配算法产生更高的平均精度(MAP)分数。这表明我们的方法可以提高结果的质量，因为考虑了文档和查询中的单词和短语的含义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)

自引率

0.00%

发文量