Machine learning methods for results merging in patent retrieval

IF 1.7 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications Pub Date : 2023-02-27 DOI:10.1108/dta-06-2021-0156

Vasileios Stamatis, M. Salampasis, K. Diamantaras

{"title":"Machine learning methods for results merging in patent retrieval","authors":"Vasileios Stamatis, M. Salampasis, K. Diamantaras","doi":"10.1108/dta-06-2021-0156","DOIUrl":null,"url":null,"abstract":"PurposeIn federated search, a query is sent simultaneously to multiple resources and each one of them returns a list of results. These lists are merged into a single list using the results merging process. In this work, the authors apply machine learning methods for results merging in federated patent search. Even though several methods for results merging have been developed, none of them were tested on patent data nor considered several machine learning models. Thus, the authors experiment with state-of-the-art methods using patent data and they propose two new methods for results merging that use machine learning models.Design/methodology/approachThe methods are based on a centralized index containing samples of documents from all the remote resources, and they implement machine learning models to estimate comparable scores for the documents retrieved by different resources. The authors examine the new methods in cooperative and uncooperative settings where document scores from the remote search engines are available and not, respectively. In uncooperative environments, they propose two methods for assigning document scores.FindingsThe effectiveness of the new results merging methods was measured against state-of-the-art models and found to be superior to them in many cases with significant improvements. The random forest model achieves the best results in comparison to all other models and presents new insights for the results merging problem.Originality/valueIn this article the authors prove that machine learning models can substitute other standard methods and models that used for results merging for many years. Our methods outperformed state-of-the-art estimation methods for results merging, and they proved that they are more effective for federated patent search.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Technologies and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1108/dta-06-2021-0156","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

PurposeIn federated search, a query is sent simultaneously to multiple resources and each one of them returns a list of results. These lists are merged into a single list using the results merging process. In this work, the authors apply machine learning methods for results merging in federated patent search. Even though several methods for results merging have been developed, none of them were tested on patent data nor considered several machine learning models. Thus, the authors experiment with state-of-the-art methods using patent data and they propose two new methods for results merging that use machine learning models.Design/methodology/approachThe methods are based on a centralized index containing samples of documents from all the remote resources, and they implement machine learning models to estimate comparable scores for the documents retrieved by different resources. The authors examine the new methods in cooperative and uncooperative settings where document scores from the remote search engines are available and not, respectively. In uncooperative environments, they propose two methods for assigning document scores.FindingsThe effectiveness of the new results merging methods was measured against state-of-the-art models and found to be superior to them in many cases with significant improvements. The random forest model achieves the best results in comparison to all other models and presents new insights for the results merging problem.Originality/valueIn this article the authors prove that machine learning models can substitute other standard methods and models that used for results merging for many years. Our methods outperformed state-of-the-art estimation methods for results merging, and they proved that they are more effective for federated patent search.

查看原文本刊更多论文

专利检索中结果合并的机器学习方法

目的在联合搜索中，一个查询被同时发送到多个资源，每个资源都返回一个结果列表。使用结果合并过程将这些列表合并为单个列表。在这项工作中，作者将机器学习方法应用于联邦专利搜索中的结果合并。尽管已经开发了几种结果合并方法，但没有一种方法在专利数据上进行测试，也没有考虑几种机器学习模型。因此，作者使用专利数据对最先进的方法进行了实验，并提出了两种使用机器学习模型的结果合并新方法。设计/方法论/方法论这些方法基于一个集中索引，该索引包含来自所有远程资源的文档样本，它们实现了机器学习模型，以估计不同资源检索到的文档的可比分数。作者研究了在合作和不合作环境中的新方法，其中远程搜索引擎的文档分数分别可用和不可用。在不合作的环境中，他们提出了两种分配文档分数的方法。发现新的结果合并方法的有效性是根据最先进的模型进行测量的，发现在许多情况下都优于它们，并有显著的改进。与所有其他模型相比，随机森林模型获得了最好的结果，并为结果合并问题提供了新的见解。独创性/价值在这篇文章中，作者证明了机器学习模型可以替代多年来用于结果合并的其他标准方法和模型。我们的方法在结果合并方面优于最先进的估计方法，并证明它们在联合专利搜索中更有效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Data Technologies and Applications Social Sciences-Library and Information Sciences

CiteScore

3.80

自引率

6.20%

发文量

期刊介绍： Previously published as: Program Online from: 2018 Subject Area: Information & Knowledge Management, Library Studies