专利检索的新型重新排序架构

IF 2.2 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE
Vasileios Stamatis, Michail Salampasis, Konstantinos Diamantaras
{"title":"专利检索的新型重新排序架构","authors":"Vasileios Stamatis,&nbsp;Michail Salampasis,&nbsp;Konstantinos Diamantaras","doi":"10.1016/j.wpi.2024.102282","DOIUrl":null,"url":null,"abstract":"<div><p>Patent search presents unique challenges due to the intricate structure and specialized terminology embedded in patent documents. While neural models have been successfully applied in various information retrieval (IR) tasks, these inherent complexities have hindered their effectiveness in patent search. To address these challenges, we propose a novel re-ranking architecture that effectively handles long, structured patent documents and leverages AI models to interpolate lexical and semantic signals of relevance. Additionally, the architecture incorporates query-specific weights for the final re-ranking process. To address partial relevance between patent sections our method effectively models the relevance relationships between different sections of patent documents. We calculate lexical and semantic signals of relevance from each document section and feed them as input features to AI models that estimate a combined relevance score. Finally, we compute query-specific weights to determine the relative contributions of lexical and semantic relevance for the final re-ranking. Extensive experiments on the CLEF-IP dataset demonstrate that our method outperforms several baselines, achieving substantial and statistically significant improvements in retrieval performance. We further assess the adaptability of our method using the MSMARCO dataset, where it exhibits limited performance, indicating its suitability for domain-specific patent research.</p></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"78 ","pages":"Article 102282"},"PeriodicalIF":2.2000,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel re-ranking architecture for patent search\",\"authors\":\"Vasileios Stamatis,&nbsp;Michail Salampasis,&nbsp;Konstantinos Diamantaras\",\"doi\":\"10.1016/j.wpi.2024.102282\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Patent search presents unique challenges due to the intricate structure and specialized terminology embedded in patent documents. While neural models have been successfully applied in various information retrieval (IR) tasks, these inherent complexities have hindered their effectiveness in patent search. To address these challenges, we propose a novel re-ranking architecture that effectively handles long, structured patent documents and leverages AI models to interpolate lexical and semantic signals of relevance. Additionally, the architecture incorporates query-specific weights for the final re-ranking process. To address partial relevance between patent sections our method effectively models the relevance relationships between different sections of patent documents. We calculate lexical and semantic signals of relevance from each document section and feed them as input features to AI models that estimate a combined relevance score. Finally, we compute query-specific weights to determine the relative contributions of lexical and semantic relevance for the final re-ranking. Extensive experiments on the CLEF-IP dataset demonstrate that our method outperforms several baselines, achieving substantial and statistically significant improvements in retrieval performance. We further assess the adaptability of our method using the MSMARCO dataset, where it exhibits limited performance, indicating its suitability for domain-specific patent research.</p></div>\",\"PeriodicalId\":51794,\"journal\":{\"name\":\"World Patent Information\",\"volume\":\"78 \",\"pages\":\"Article 102282\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"World Patent Information\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S017221902400022X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"INFORMATION SCIENCE & LIBRARY SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Patent Information","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S017221902400022X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0

摘要

由于专利文件中蕴含着错综复杂的结构和专业术语,专利检索面临着独特的挑战。虽然神经模型已成功应用于各种信息检索(IR)任务,但这些固有的复杂性阻碍了它们在专利检索中的有效性。为了应对这些挑战,我们提出了一种新颖的重新排序架构,它能有效处理冗长的结构化专利文档,并利用人工智能模型来插值相关的词汇和语义信号。此外,该架构还在最终重新排序过程中加入了特定查询权重。为了解决专利部分之间的部分相关性问题,我们的方法对专利文件不同部分之间的相关性关系进行了有效建模。我们计算每个文档部分的词汇和语义相关性信号,并将其作为输入特征提供给人工智能模型,从而估算出综合相关性得分。最后,我们计算特定查询的权重,以确定词义和语义相关性对最终重新排序的相对贡献。在 CLEF-IP 数据集上进行的大量实验表明,我们的方法优于几种基线方法,在检索性能方面取得了显著的统计改进。我们还使用 MSMARCO 数据集进一步评估了我们方法的适应性,该数据集的性能有限,这表明我们的方法适用于特定领域的专利研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A novel re-ranking architecture for patent search

Patent search presents unique challenges due to the intricate structure and specialized terminology embedded in patent documents. While neural models have been successfully applied in various information retrieval (IR) tasks, these inherent complexities have hindered their effectiveness in patent search. To address these challenges, we propose a novel re-ranking architecture that effectively handles long, structured patent documents and leverages AI models to interpolate lexical and semantic signals of relevance. Additionally, the architecture incorporates query-specific weights for the final re-ranking process. To address partial relevance between patent sections our method effectively models the relevance relationships between different sections of patent documents. We calculate lexical and semantic signals of relevance from each document section and feed them as input features to AI models that estimate a combined relevance score. Finally, we compute query-specific weights to determine the relative contributions of lexical and semantic relevance for the final re-ranking. Extensive experiments on the CLEF-IP dataset demonstrate that our method outperforms several baselines, achieving substantial and statistically significant improvements in retrieval performance. We further assess the adaptability of our method using the MSMARCO dataset, where it exhibits limited performance, indicating its suitability for domain-specific patent research.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
World Patent Information
World Patent Information INFORMATION SCIENCE & LIBRARY SCIENCE-
CiteScore
3.50
自引率
18.50%
发文量
40
期刊介绍: The aim of World Patent Information is to provide a worldwide forum for the exchange of information between people working professionally in the field of Industrial Property information and documentation and to promote the widest possible use of the associated literature. Regular features include: papers concerned with all aspects of Industrial Property information and documentation; new regulations pertinent to Industrial Property information and documentation; short reports on relevant meetings and conferences; bibliographies, together with book and literature reviews.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信