SEARCHFORMER: Semantic patent embeddings by siamese transformers for prior art search

IF 2.2 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE
Konrad Vowinckel, Volker D. Hähnke
{"title":"SEARCHFORMER: Semantic patent embeddings by siamese transformers for prior art search","authors":"Konrad Vowinckel,&nbsp;Volker D. Hähnke","doi":"10.1016/j.wpi.2023.102192","DOIUrl":null,"url":null,"abstract":"<div><p><span><span><span>The identification of relevant prior art for patent applications is of key importance for the work of patent examiners. The recent advancements in the field of </span>natural language processing in the form of </span>language models<span><span> such as BERT enable the creation of the next generation of </span>prior art search tools. These models can generate vectorial representations of input text, enabling the use of vector similarity as proxy for semantic text similarity. We fine-tuned a patent-specific BERT model for prior art search on a large set of real-world examples of patent claims, corresponding passages prejudicing novelty or inventive step, and random text fragments, creating the SEARCHFORMER. We show in retrospective ranking experiments that our model is a real improvement. For this purpose, we compiled an evaluation collection comprising 2014 pairs of patent application and related potential prior art documents. We employed two representative baselines for comparison: (i) an optimized combination of automatically built queries and the BM25 ranking function, and (ii) several state-of-the-art language models, including SentenceTransformers optimized for semantic retrieval. Ranking performance was measured as rank of the first relevant result. Using t-tests, we show that the achieved ranking improvements of the SEARCHFORMER over the baselines are statistically significant (</span></span><span><math><mrow><mi>α</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>01</mn></mrow></math></span>).</p></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"73 ","pages":"Article 102192"},"PeriodicalIF":2.2000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Patent Information","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0172219023000224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 3

Abstract

The identification of relevant prior art for patent applications is of key importance for the work of patent examiners. The recent advancements in the field of natural language processing in the form of language models such as BERT enable the creation of the next generation of prior art search tools. These models can generate vectorial representations of input text, enabling the use of vector similarity as proxy for semantic text similarity. We fine-tuned a patent-specific BERT model for prior art search on a large set of real-world examples of patent claims, corresponding passages prejudicing novelty or inventive step, and random text fragments, creating the SEARCHFORMER. We show in retrospective ranking experiments that our model is a real improvement. For this purpose, we compiled an evaluation collection comprising 2014 pairs of patent application and related potential prior art documents. We employed two representative baselines for comparison: (i) an optimized combination of automatically built queries and the BM25 ranking function, and (ii) several state-of-the-art language models, including SentenceTransformers optimized for semantic retrieval. Ranking performance was measured as rank of the first relevant result. Using t-tests, we show that the achieved ranking improvements of the SEARCHFORMER over the baselines are statistically significant (α=0.01).

通过连体转换器进行语义专利嵌入,用于现有技术搜索
专利申请中相关现有技术的识别对专利审查人员的工作至关重要。自然语言处理领域的最新进展以语言模型(如BERT)的形式出现,使下一代现有技术搜索工具的创建成为可能。这些模型可以生成输入文本的向量表示,支持使用向量相似度作为语义文本相似度的代理。我们对一个特定于专利的BERT模型进行了微调,该模型用于对大量现实世界的专利权利要求、对新颖性或创造性步骤有偏见的相应段落以及随机文本片段进行现有技术搜索,从而创建了SEARCHFORMER。我们在回顾性排名实验中表明,我们的模型是一个真正的改进。为此,我们编制了一个评估集,包括2014对专利申请和相关的潜在现有技术文件。我们使用了两个代表性的基线进行比较:(i)自动构建查询和BM25排序函数的优化组合,以及(ii)几个最先进的语言模型,包括为语义检索优化的SentenceTransformers。排名性能以第一个相关结果的排名来衡量。使用t检验,我们表明SEARCHFORMER在基线上取得的排名改进具有统计学意义(α=0.01)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
World Patent Information
World Patent Information INFORMATION SCIENCE & LIBRARY SCIENCE-
CiteScore
3.50
自引率
18.50%
发文量
40
期刊介绍: The aim of World Patent Information is to provide a worldwide forum for the exchange of information between people working professionally in the field of Industrial Property information and documentation and to promote the widest possible use of the associated literature. Regular features include: papers concerned with all aspects of Industrial Property information and documentation; new regulations pertinent to Industrial Property information and documentation; short reports on relevant meetings and conferences; bibliographies, together with book and literature reviews.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信