Leveraging patent classification based on deep learning: The case study on smart cities and industrial Internet of Things

IF 3.5 2区管理学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Informetrics Pub Date : 2024-11-28 DOI:10.1016/j.joi.2024.101616

Munan Li , Liang Wang

{"title":"Leveraging patent classification based on deep learning: The case study on smart cities and industrial Internet of Things","authors":"Munan Li , Liang Wang","doi":"10.1016/j.joi.2024.101616","DOIUrl":null,"url":null,"abstract":"<div><div>With the trends of technology convergence and technology interdisciplinarity, technology-field (TF) resolution and classification of patents have gradually been challenged. Whether for patent applicants or for patent examiners, more precisely labeling the TF for a certain patent is important for technological searches. However, determining the TF of a patent may be difficult and may even involve the strategic behavior of patenting, which can cause noise in patent classification systems (PCSs). In addition, some specific patents could contain more TFs than claimed or be assigned questionable IPC codes; subsequently, in a regular search for technology/patents, information could be missed. Considering the advantages of deep learning compared with traditional machine learning algorithms in areas such as natural language processing (NLP), text classification and text sentiment analysis, this paper investigates several popular deep learning models and proposes a large-scale multilabel regression (MLR) model to handle specific patent analyses under situations of small sample learning. To verify the proposed MLR model for patent classification, the case study on smart cities and industrial Internet of Things (IIoT) is conducted. The MLR experiments on the TF resolution of smart cities and IIoT have yielded moderate results compared with those of the latest patent classification studies, which also rely on deep learning and the large language models (LLMs), which include RCNN, Bi-LSTM, BERT and GPT-4 etc. Therefore, the proposed MLR model with a customized loss function could be moderately effective for patent classification within a specific technology theme, could have implications for patent classification and the TF resolution of patents, and could further enrich methodologies for patent mining and informetrics based on artificial intelligence (AI).</div></div>","PeriodicalId":48662,"journal":{"name":"Journal of Informetrics","volume":"19 1","pages":"Article 101616"},"PeriodicalIF":3.5000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Informetrics","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751157724001287","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

With the trends of technology convergence and technology interdisciplinarity, technology-field (TF) resolution and classification of patents have gradually been challenged. Whether for patent applicants or for patent examiners, more precisely labeling the TF for a certain patent is important for technological searches. However, determining the TF of a patent may be difficult and may even involve the strategic behavior of patenting, which can cause noise in patent classification systems (PCSs). In addition, some specific patents could contain more TFs than claimed or be assigned questionable IPC codes; subsequently, in a regular search for technology/patents, information could be missed. Considering the advantages of deep learning compared with traditional machine learning algorithms in areas such as natural language processing (NLP), text classification and text sentiment analysis, this paper investigates several popular deep learning models and proposes a large-scale multilabel regression (MLR) model to handle specific patent analyses under situations of small sample learning. To verify the proposed MLR model for patent classification, the case study on smart cities and industrial Internet of Things (IIoT) is conducted. The MLR experiments on the TF resolution of smart cities and IIoT have yielded moderate results compared with those of the latest patent classification studies, which also rely on deep learning and the large language models (LLMs), which include RCNN, Bi-LSTM, BERT and GPT-4 etc. Therefore, the proposed MLR model with a customized loss function could be moderately effective for patent classification within a specific technology theme, could have implications for patent classification and the TF resolution of patents, and could further enrich methodologies for patent mining and informetrics based on artificial intelligence (AI).

查看原文本刊更多论文

利用基于深度学习的专利分类：智慧城市与工业物联网案例研究

随着技术融合和技术交叉的趋势，专利的技术领域（TF）划分和分类逐渐受到挑战。无论是对专利申请人还是专利审查员来说，更精确地标记特定专利的TF对于技术检索都很重要。然而，确定专利的TF可能很困难，甚至可能涉及专利的战略行为，这可能会在专利分类系统（PCSs）中造成噪音。此外，一些特定专利可能包含比所要求的更多的tf，或分配有问题的IPC代码；随后，在定期搜索技术/专利时，可能会遗漏信息。考虑到深度学习在自然语言处理（NLP）、文本分类和文本情感分析等领域相对于传统机器学习算法的优势，本文研究了几种流行的深度学习模型，并提出了一种大规模多标签回归（MLR）模型来处理小样本学习情况下的具体专利分析。为了验证所提出的专利分类MLR模型，本文以智慧城市和工业物联网为例进行了研究。与最新的专利分类研究相比，关于智慧城市和工业物联网的TF分辨率的MLR实验取得了中等结果，这些研究同样依赖于深度学习和大型语言模型（llm），包括RCNN、Bi-LSTM、BERT和GPT-4等。因此，所提出的具有定制损失函数的MLR模型对于特定技术主题内的专利分类可能是中等有效的，可能对专利分类和专利的TF分辨率有影响，并可能进一步丰富基于人工智能（AI）的专利挖掘和信息计量方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Informetrics Social Sciences-Library and Information Sciences

CiteScore

6.40

自引率

16.20%

发文量

期刊介绍： Journal of Informetrics (JOI) publishes rigorous high-quality research on quantitative aspects of information science. The main focus of the journal is on topics in bibliometrics, scientometrics, webometrics, patentometrics, altmetrics and research evaluation. Contributions studying informetric problems using methods from other quantitative fields, such as mathematics, statistics, computer science, economics and econometrics, and network science, are especially encouraged. JOI publishes both theoretical and empirical work. In general, case studies, for instance a bibliometric analysis focusing on a specific research field or a specific country, are not considered suitable for publication in JOI, unless they contain innovative methodological elements.