Do large language models understand patents? Enhancing patent classification through AI-generated summaries

IF 2.2 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE
Naoya Yoshikawa , Ralf Krestel
{"title":"Do large language models understand patents? Enhancing patent classification through AI-generated summaries","authors":"Naoya Yoshikawa ,&nbsp;Ralf Krestel","doi":"10.1016/j.wpi.2025.102353","DOIUrl":null,"url":null,"abstract":"<div><div>Patent classification plays a crucial role in intellectual property management, but remains a challenging task due to the complexity of patent documents. This study explores a novel approach to enhance automatic patent classification by leveraging summaries generated by large language models (LLMs). Our approach involves using the GPT-3.5-turbo model to create concise summaries from different sections of patent texts, which are then used to fine-tune the RoBERTa and XLM-RoBERTa models for classification tasks. We conducted experiments on English and Japanese patent documents using two datasets: the well-established USPTO-70k and the newly developed JPO-70k, that we specifically created for this study.</div><div>Our findings show that models trained on AI-generated summaries – particularly those derived from patent claims or detailed descriptions – outperform models trained on original abstracts in both subclass-level multi-label classification and subgroup-level single-label classification. In particular, using detailed description summaries improved the micro-average F1 score for subclass-level classification by 2.9 points on the USPTO-70k and 3.0 points on the JPO-70k, compared to using original abstracts.</div><div>These results indicate that LLM-generated summaries effectively capture information relevant to patent classification from various sections of patent texts, offering a promising approach to enhance the accuracy and efficiency of patent classification across different languages.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"81 ","pages":"Article 102353"},"PeriodicalIF":2.2000,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Patent Information","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0172219025000201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Patent classification plays a crucial role in intellectual property management, but remains a challenging task due to the complexity of patent documents. This study explores a novel approach to enhance automatic patent classification by leveraging summaries generated by large language models (LLMs). Our approach involves using the GPT-3.5-turbo model to create concise summaries from different sections of patent texts, which are then used to fine-tune the RoBERTa and XLM-RoBERTa models for classification tasks. We conducted experiments on English and Japanese patent documents using two datasets: the well-established USPTO-70k and the newly developed JPO-70k, that we specifically created for this study.
Our findings show that models trained on AI-generated summaries – particularly those derived from patent claims or detailed descriptions – outperform models trained on original abstracts in both subclass-level multi-label classification and subgroup-level single-label classification. In particular, using detailed description summaries improved the micro-average F1 score for subclass-level classification by 2.9 points on the USPTO-70k and 3.0 points on the JPO-70k, compared to using original abstracts.
These results indicate that LLM-generated summaries effectively capture information relevant to patent classification from various sections of patent texts, offering a promising approach to enhance the accuracy and efficiency of patent classification across different languages.
求助全文
约1分钟内获得全文 求助全文
来源期刊
World Patent Information
World Patent Information INFORMATION SCIENCE & LIBRARY SCIENCE-
CiteScore
3.50
自引率
18.50%
发文量
40
期刊介绍: The aim of World Patent Information is to provide a worldwide forum for the exchange of information between people working professionally in the field of Industrial Property information and documentation and to promote the widest possible use of the associated literature. Regular features include: papers concerned with all aspects of Industrial Property information and documentation; new regulations pertinent to Industrial Property information and documentation; short reports on relevant meetings and conferences; bibliographies, together with book and literature reviews.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信