{"title":"Reducing Catastrophic Forgetting in Online Class Incremental Learning Using Self-Distillation","authors":"Kotaro Nagata, Hiromu Ono, Kazuhiro Hotta","doi":"arxiv-2409.11329","DOIUrl":null,"url":null,"abstract":"In continual learning, there is a serious problem of catastrophic forgetting,\nin which previous knowledge is forgotten when a model learns new tasks. Various\nmethods have been proposed to solve this problem. Replay methods which replay\ndata from previous tasks in later training, have shown good accuracy. However,\nreplay methods have a generalizability problem from a limited memory buffer. In\nthis paper, we tried to solve this problem by acquiring transferable knowledge\nthrough self-distillation using highly generalizable output in shallow layer as\na teacher. Furthermore, when we deal with a large number of classes or\nchallenging data, there is a risk of learning not converging and not\nexperiencing overfitting. Therefore, we attempted to achieve more efficient and\nthorough learning by prioritizing the storage of easily misclassified samples\nthrough a new method of memory update. We confirmed that our proposed method\noutperformed conventional methods by experiments on CIFAR10, CIFAR100, and\nMiniimageNet datasets.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11329","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In continual learning, there is a serious problem of catastrophic forgetting,
in which previous knowledge is forgotten when a model learns new tasks. Various
methods have been proposed to solve this problem. Replay methods which replay
data from previous tasks in later training, have shown good accuracy. However,
replay methods have a generalizability problem from a limited memory buffer. In
this paper, we tried to solve this problem by acquiring transferable knowledge
through self-distillation using highly generalizable output in shallow layer as
a teacher. Furthermore, when we deal with a large number of classes or
challenging data, there is a risk of learning not converging and not
experiencing overfitting. Therefore, we attempted to achieve more efficient and
thorough learning by prioritizing the storage of easily misclassified samples
through a new method of memory update. We confirmed that our proposed method
outperformed conventional methods by experiments on CIFAR10, CIFAR100, and
MiniimageNet datasets.