{"title":"Scalable multi-label patent classification via iterative large language model-assisted active learning","authors":"Songquan Xiong, Shikun Chen, Jianwei He, Yangguang Liu, Junjie Mao, Chao Liu","doi":"10.1016/j.wpi.2025.102380","DOIUrl":null,"url":null,"abstract":"<div><div>Patent classification faces increasingly complex challenges due to the exponential growth in volume and technical sophistication of global patent databases. A substantial proportion of patents inherently belong to multiple technological categories simultaneously, rendering classification particularly challenging for both manual and automated systems. Current approaches struggle with computational scalability, prohibitive annotation costs, and the accurate identification of overlapping technical concepts across interdisciplinary innovations. This study presents a novel iterative framework that combines the advanced text comprehension capabilities of Large Language Models (LLMs) with the sample-efficient principles of active learning (AL) for scalable multi-label patent classification. We evaluated our approach using drone-related technologies extracted from a comprehensive dataset of 100,000 patents, focusing on ten key technological component categories. Our LLM-assisted active learning methodology achieved Macro-F1 and Micro-F1 scores of 0.85 and 0.88, respectively, demonstrating a 15% improvement in Macro-F1 compared to established baseline methods. Our approach reduced the required manual annotation effort by approximately 60% while maintaining comparable classification performance. These empirical findings demonstrate the potential for transforming large-scale patent analysis workflows and improving the efficiency of intellectual property management systems</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"82 ","pages":"Article 102380"},"PeriodicalIF":1.9000,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Patent Information","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S017221902500047X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Patent classification faces increasingly complex challenges due to the exponential growth in volume and technical sophistication of global patent databases. A substantial proportion of patents inherently belong to multiple technological categories simultaneously, rendering classification particularly challenging for both manual and automated systems. Current approaches struggle with computational scalability, prohibitive annotation costs, and the accurate identification of overlapping technical concepts across interdisciplinary innovations. This study presents a novel iterative framework that combines the advanced text comprehension capabilities of Large Language Models (LLMs) with the sample-efficient principles of active learning (AL) for scalable multi-label patent classification. We evaluated our approach using drone-related technologies extracted from a comprehensive dataset of 100,000 patents, focusing on ten key technological component categories. Our LLM-assisted active learning methodology achieved Macro-F1 and Micro-F1 scores of 0.85 and 0.88, respectively, demonstrating a 15% improvement in Macro-F1 compared to established baseline methods. Our approach reduced the required manual annotation effort by approximately 60% while maintaining comparable classification performance. These empirical findings demonstrate the potential for transforming large-scale patent analysis workflows and improving the efficiency of intellectual property management systems
期刊介绍:
The aim of World Patent Information is to provide a worldwide forum for the exchange of information between people working professionally in the field of Industrial Property information and documentation and to promote the widest possible use of the associated literature. Regular features include: papers concerned with all aspects of Industrial Property information and documentation; new regulations pertinent to Industrial Property information and documentation; short reports on relevant meetings and conferences; bibliographies, together with book and literature reviews.