{"title":"Do large language models understand patents? Enhancing patent classification through AI-generated summaries","authors":"Naoya Yoshikawa , Ralf Krestel","doi":"10.1016/j.wpi.2025.102353","DOIUrl":null,"url":null,"abstract":"<div><div>Patent classification plays a crucial role in intellectual property management, but remains a challenging task due to the complexity of patent documents. This study explores a novel approach to enhance automatic patent classification by leveraging summaries generated by large language models (LLMs). Our approach involves using the GPT-3.5-turbo model to create concise summaries from different sections of patent texts, which are then used to fine-tune the RoBERTa and XLM-RoBERTa models for classification tasks. We conducted experiments on English and Japanese patent documents using two datasets: the well-established USPTO-70k and the newly developed JPO-70k, that we specifically created for this study.</div><div>Our findings show that models trained on AI-generated summaries – particularly those derived from patent claims or detailed descriptions – outperform models trained on original abstracts in both subclass-level multi-label classification and subgroup-level single-label classification. In particular, using detailed description summaries improved the micro-average F1 score for subclass-level classification by 2.9 points on the USPTO-70k and 3.0 points on the JPO-70k, compared to using original abstracts.</div><div>These results indicate that LLM-generated summaries effectively capture information relevant to patent classification from various sections of patent texts, offering a promising approach to enhance the accuracy and efficiency of patent classification across different languages.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"81 ","pages":"Article 102353"},"PeriodicalIF":2.2000,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Patent Information","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0172219025000201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Patent classification plays a crucial role in intellectual property management, but remains a challenging task due to the complexity of patent documents. This study explores a novel approach to enhance automatic patent classification by leveraging summaries generated by large language models (LLMs). Our approach involves using the GPT-3.5-turbo model to create concise summaries from different sections of patent texts, which are then used to fine-tune the RoBERTa and XLM-RoBERTa models for classification tasks. We conducted experiments on English and Japanese patent documents using two datasets: the well-established USPTO-70k and the newly developed JPO-70k, that we specifically created for this study.
Our findings show that models trained on AI-generated summaries – particularly those derived from patent claims or detailed descriptions – outperform models trained on original abstracts in both subclass-level multi-label classification and subgroup-level single-label classification. In particular, using detailed description summaries improved the micro-average F1 score for subclass-level classification by 2.9 points on the USPTO-70k and 3.0 points on the JPO-70k, compared to using original abstracts.
These results indicate that LLM-generated summaries effectively capture information relevant to patent classification from various sections of patent texts, offering a promising approach to enhance the accuracy and efficiency of patent classification across different languages.
期刊介绍:
The aim of World Patent Information is to provide a worldwide forum for the exchange of information between people working professionally in the field of Industrial Property information and documentation and to promote the widest possible use of the associated literature. Regular features include: papers concerned with all aspects of Industrial Property information and documentation; new regulations pertinent to Industrial Property information and documentation; short reports on relevant meetings and conferences; bibliographies, together with book and literature reviews.