更少的自信，更少的遗忘：在无范例班级强化学习中与谦虚的教师一起学习

IF 6.3 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2024-07-06 DOI:10.1016/j.neunet.2024.106513

Zijian Gao , Kele Xu , Huiping Zhuang , Li Liu , Xinjun Mao , Bo Ding , Dawei Feng , Huaimin Wang

{"title":"更少的自信，更少的遗忘：在无范例班级强化学习中与谦虚的教师一起学习","authors":"Zijian Gao , Kele Xu , Huiping Zhuang , Li Liu , Xinjun Mao , Bo Ding , Dawei Feng , Huaimin Wang","doi":"10.1016/j.neunet.2024.106513","DOIUrl":null,"url":null,"abstract":"<div><p>Class-Incremental learning (CIL) is challenging due to catastrophic forgetting (CF), which escalates in exemplar-free scenarios. To mitigate CF, Knowledge Distillation (KD), which leverages old models as teacher models, has been widely employed in CIL. However, based on a case study, our investigation reveals that the teacher model exhibits over-confidence in unseen new samples. In this article, we conduct empirical experiments and provide theoretical analysis to investigate the over-confident phenomenon and the impact of KD in exemplar-free CIL, where access to old samples is unavailable. Building on our analysis, we propose a novel approach, Learning with Humbler Teacher, by systematically selecting an appropriate checkpoint model as a humbler teacher to mitigate CF. Furthermore, we explore utilizing the nuclear norm to obtain an appropriate temporal ensemble to enhance model stability. Notably, LwHT outperforms the state-of-the-art approach by a significant margin of 10.41%, 6.56%, and 4.31% in various settings while demonstrating superior model plasticity.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"179 ","pages":"Article 106513"},"PeriodicalIF":6.3000,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Less confidence, less forgetting: Learning with a humbler teacher in exemplar-free Class-Incremental learning\",\"authors\":\"Zijian Gao , Kele Xu , Huiping Zhuang , Li Liu , Xinjun Mao , Bo Ding , Dawei Feng , Huaimin Wang\",\"doi\":\"10.1016/j.neunet.2024.106513\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Class-Incremental learning (CIL) is challenging due to catastrophic forgetting (CF), which escalates in exemplar-free scenarios. To mitigate CF, Knowledge Distillation (KD), which leverages old models as teacher models, has been widely employed in CIL. However, based on a case study, our investigation reveals that the teacher model exhibits over-confidence in unseen new samples. In this article, we conduct empirical experiments and provide theoretical analysis to investigate the over-confident phenomenon and the impact of KD in exemplar-free CIL, where access to old samples is unavailable. Building on our analysis, we propose a novel approach, Learning with Humbler Teacher, by systematically selecting an appropriate checkpoint model as a humbler teacher to mitigate CF. Furthermore, we explore utilizing the nuclear norm to obtain an appropriate temporal ensemble to enhance model stability. Notably, LwHT outperforms the state-of-the-art approach by a significant margin of 10.41%, 6.56%, and 4.31% in various settings while demonstrating superior model plasticity.</p></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"179 \",\"pages\":\"Article 106513\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2024-07-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0893608024004374\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608024004374","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

由于灾难性遗忘（CF）会在无范例的情况下升级，因此分类增量学习（CIL）具有挑战性。为了减轻灾难性遗忘，知识蒸馏（Knowledge Distillation，KD）技术在 CIL 中得到了广泛应用，它利用旧模型作为教师模型。然而，基于一项案例研究，我们的调查发现，教师模型在未见过的新样本中表现出过度自信。在本文中，我们通过实证实验和理论分析，研究了在无法获得旧样本的无范例 CIL 中的过度自信现象和 KD 的影响。在分析的基础上，我们提出了一种新方法--"用谦虚的老师学习"（Learning with Humbler Teacher），通过系统地选择一个合适的检查点模型作为谦虚的老师来减轻过度自信现象。此外，我们还探索利用核规范来获得适当的时间集合，以增强模型的稳定性。值得注意的是，LwHT 在各种环境下的表现都明显优于最先进的方法，分别为 10.41%、6.56% 和 4.31%，同时展示了卓越的模型可塑性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Less confidence, less forgetting: Learning with a humbler teacher in exemplar-free Class-Incremental learning

Class-Incremental learning (CIL) is challenging due to catastrophic forgetting (CF), which escalates in exemplar-free scenarios. To mitigate CF, Knowledge Distillation (KD), which leverages old models as teacher models, has been widely employed in CIL. However, based on a case study, our investigation reveals that the teacher model exhibits over-confidence in unseen new samples. In this article, we conduct empirical experiments and provide theoretical analysis to investigate the over-confident phenomenon and the impact of KD in exemplar-free CIL, where access to old samples is unavailable. Building on our analysis, we propose a novel approach, Learning with Humbler Teacher, by systematically selecting an appropriate checkpoint model as a humbler teacher to mitigate CF. Furthermore, we explore utilizing the nuclear norm to obtain an appropriate temporal ensemble to enhance model stability. Notably, LwHT outperforms the state-of-the-art approach by a significant margin of 10.41%, 6.56%, and 4.31% in various settings while demonstrating superior model plasticity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.