A novel re-ranking architecture for patent search

IF 2.2 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE

World Patent Information Pub Date : 2024-05-28 DOI:10.1016/j.wpi.2024.102282

Vasileios Stamatis, Michail Salampasis, Konstantinos Diamantaras

{"title":"A novel re-ranking architecture for patent search","authors":"Vasileios Stamatis, Michail Salampasis, Konstantinos Diamantaras","doi":"10.1016/j.wpi.2024.102282","DOIUrl":null,"url":null,"abstract":"<div><p>Patent search presents unique challenges due to the intricate structure and specialized terminology embedded in patent documents. While neural models have been successfully applied in various information retrieval (IR) tasks, these inherent complexities have hindered their effectiveness in patent search. To address these challenges, we propose a novel re-ranking architecture that effectively handles long, structured patent documents and leverages AI models to interpolate lexical and semantic signals of relevance. Additionally, the architecture incorporates query-specific weights for the final re-ranking process. To address partial relevance between patent sections our method effectively models the relevance relationships between different sections of patent documents. We calculate lexical and semantic signals of relevance from each document section and feed them as input features to AI models that estimate a combined relevance score. Finally, we compute query-specific weights to determine the relative contributions of lexical and semantic relevance for the final re-ranking. Extensive experiments on the CLEF-IP dataset demonstrate that our method outperforms several baselines, achieving substantial and statistically significant improvements in retrieval performance. We further assess the adaptability of our method using the MSMARCO dataset, where it exhibits limited performance, indicating its suitability for domain-specific patent research.</p></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"78 ","pages":"Article 102282"},"PeriodicalIF":2.2000,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Patent Information","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S017221902400022X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Patent search presents unique challenges due to the intricate structure and specialized terminology embedded in patent documents. While neural models have been successfully applied in various information retrieval (IR) tasks, these inherent complexities have hindered their effectiveness in patent search. To address these challenges, we propose a novel re-ranking architecture that effectively handles long, structured patent documents and leverages AI models to interpolate lexical and semantic signals of relevance. Additionally, the architecture incorporates query-specific weights for the final re-ranking process. To address partial relevance between patent sections our method effectively models the relevance relationships between different sections of patent documents. We calculate lexical and semantic signals of relevance from each document section and feed them as input features to AI models that estimate a combined relevance score. Finally, we compute query-specific weights to determine the relative contributions of lexical and semantic relevance for the final re-ranking. Extensive experiments on the CLEF-IP dataset demonstrate that our method outperforms several baselines, achieving substantial and statistically significant improvements in retrieval performance. We further assess the adaptability of our method using the MSMARCO dataset, where it exhibits limited performance, indicating its suitability for domain-specific patent research.

查看原文本刊更多论文

专利检索的新型重新排序架构

由于专利文件中蕴含着错综复杂的结构和专业术语，专利检索面临着独特的挑战。虽然神经模型已成功应用于各种信息检索（IR）任务，但这些固有的复杂性阻碍了它们在专利检索中的有效性。为了应对这些挑战，我们提出了一种新颖的重新排序架构，它能有效处理冗长的结构化专利文档，并利用人工智能模型来插值相关的词汇和语义信号。此外，该架构还在最终重新排序过程中加入了特定查询权重。为了解决专利部分之间的部分相关性问题，我们的方法对专利文件不同部分之间的相关性关系进行了有效建模。我们计算每个文档部分的词汇和语义相关性信号，并将其作为输入特征提供给人工智能模型，从而估算出综合相关性得分。最后，我们计算特定查询的权重，以确定词义和语义相关性对最终重新排序的相对贡献。在 CLEF-IP 数据集上进行的大量实验表明，我们的方法优于几种基线方法，在检索性能方面取得了显著的统计改进。我们还使用 MSMARCO 数据集进一步评估了我们方法的适应性，该数据集的性能有限，这表明我们的方法适用于特定领域的专利研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

World Patent Information INFORMATION SCIENCE & LIBRARY SCIENCE-

CiteScore

3.50

自引率

18.50%

发文量

期刊介绍： The aim of World Patent Information is to provide a worldwide forum for the exchange of information between people working professionally in the field of Industrial Property information and documentation and to promote the widest possible use of the associated literature. Regular features include: papers concerned with all aspects of Industrial Property information and documentation; new regulations pertinent to Industrial Property information and documentation; short reports on relevant meetings and conferences; bibliographies, together with book and literature reviews.