Automated patent classification for crop protection via domain adaptation

Applied AI letters Pub Date : 2023-02-15 DOI:10.1002/ail2.80

Dimitrios Christofidellis, Marzena Maria Lehmann, Torsten Luksch, Marco Stenta, Matteo Manica

{"title":"Automated patent classification for crop protection via domain adaptation","authors":"Dimitrios Christofidellis, Marzena Maria Lehmann, Torsten Luksch, Marco Stenta, Matteo Manica","doi":"10.1002/ail2.80","DOIUrl":null,"url":null,"abstract":"<p>Patents show how technology evolves in most scientific fields over time. The best way to use this valuable knowledge base is to use efficient and effective information retrieval and searches for related prior art. Patent classification, that is, assigning a patent to one or more predefined categories, is a fundamental step towards synthesizing the information content of an invention. To this end, architectures based on Transformers, especially those derived from the BERT family have already been proposed in the literature and they have shown remarkable results by setting a new state-of-the-art performance for the classification task. Here, we study how domain adaptation can push the performance boundaries in patent classification by rigorously evaluating and implementing a collection of recent transfer learning techniques, for example, domain-adaptive pretraining and adapters. Our analysis shows how leveraging these advancements enables the development of state-of-the-art models with increased precision, recall, and <i>F</i>1-score. We base our evaluation on both standard patent classification datasets derived from patent offices-defined code hierarchies and more practical real-world use-case scenarios containing labels from the agrochemical industrial domain. The application of these domain adapted techniques to patent classification in a multilingual setting is also examined and evaluated.</p>","PeriodicalId":72253,"journal":{"name":"Applied AI letters","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ail2.80","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied AI letters","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ail2.80","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Patents show how technology evolves in most scientific fields over time. The best way to use this valuable knowledge base is to use efficient and effective information retrieval and searches for related prior art. Patent classification, that is, assigning a patent to one or more predefined categories, is a fundamental step towards synthesizing the information content of an invention. To this end, architectures based on Transformers, especially those derived from the BERT family have already been proposed in the literature and they have shown remarkable results by setting a new state-of-the-art performance for the classification task. Here, we study how domain adaptation can push the performance boundaries in patent classification by rigorously evaluating and implementing a collection of recent transfer learning techniques, for example, domain-adaptive pretraining and adapters. Our analysis shows how leveraging these advancements enables the development of state-of-the-art models with increased precision, recall, and F1-score. We base our evaluation on both standard patent classification datasets derived from patent offices-defined code hierarchies and more practical real-world use-case scenarios containing labels from the agrochemical industrial domain. The application of these domain adapted techniques to patent classification in a multilingual setting is also examined and evaluated.

Abstract Image

查看原文本刊更多论文

通过领域适应的作物保护自动专利分类

专利显示了大多数科学领域的技术如何随着时间的推移而演变。使用这个有价值的知识库的最佳方法是使用高效和有效的信息检索和相关现有技术的搜索。专利分类，即将专利分配给一个或多个预定义的类别，是合成发明信息内容的基本步骤。为此，基于transformer的架构，特别是那些来自BERT家族的架构已经在文献中提出，并且通过为分类任务设置新的最先进的性能，它们已经显示出显着的结果。在这里，我们通过严格评估和实施一系列最新的迁移学习技术(例如，领域自适应预训练和适配器)来研究领域自适应如何在专利分类中突破性能界限。我们的分析显示了如何利用这些进步来开发具有更高精度、召回率和f1分数的最先进模型。我们的评估基于来自专利局定义的代码层次结构的标准专利分类数据集，以及包含农化工业领域标签的更实际的现实用例场景。这些领域适应技术在多语言环境下的专利分类应用也被检查和评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied AI letters

自引率

0.00%

发文量