{"title":"Residual Learning for Self-Knowledge Distillation: Enhancing Neural Networks Through Consistency Across Layers","authors":"Hanpeng Liu;Shuoxi Zhang;Kun He","doi":"10.1109/TBDATA.2025.3552326","DOIUrl":null,"url":null,"abstract":"Knowledge distillation is widely used technique to transfer knowledge from a large pretrained teacher network to a small student network. However, training complex teacher models requires significant computational resources and storage. To address this, a growing area of research, known as self-knowledge distillation (Self-KD), aims to enhance the performance of a neural network by leveraging its own latent knowledge. Despite its potential, existing Self-KD methods often struggle to effectively extract and utilize the model's dark knowledge. In this work, we identify a consistency problem between feature layer and output layer, and propose a novel Self-KD approach called <bold>R</b>esidual Learning for <bold>S</b>elf-<bold>K</b>nowledge <bold>D</b>istillation (<bold>RSKD</b>). Our method addresses this issue by enabling the last feature layer of the student model learn the residual gap between the outputs of the pseudo-teacher and the student. Additionally, we extend RSKD by allowing each intermediate feature layer of the student model to learn the residual gap between the corresponding deeper features of the pseudo-teacher and the student. Extensive experiments on various visual datasets demonstrate the effectiveness of the proposed method, which outperforms the state-of-the-art baselines.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2615-2627"},"PeriodicalIF":5.7000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10930600/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Knowledge distillation is widely used technique to transfer knowledge from a large pretrained teacher network to a small student network. However, training complex teacher models requires significant computational resources and storage. To address this, a growing area of research, known as self-knowledge distillation (Self-KD), aims to enhance the performance of a neural network by leveraging its own latent knowledge. Despite its potential, existing Self-KD methods often struggle to effectively extract and utilize the model's dark knowledge. In this work, we identify a consistency problem between feature layer and output layer, and propose a novel Self-KD approach called Residual Learning for Self-Knowledge Distillation (RSKD). Our method addresses this issue by enabling the last feature layer of the student model learn the residual gap between the outputs of the pseudo-teacher and the student. Additionally, we extend RSKD by allowing each intermediate feature layer of the student model to learn the residual gap between the corresponding deeper features of the pseudo-teacher and the student. Extensive experiments on various visual datasets demonstrate the effectiveness of the proposed method, which outperforms the state-of-the-art baselines.
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.