Data-free knowledge distillation via text-noise fusion and dynamic adversarial temperature

IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Deheng Zeng , Zhengyang Wu , Yunwen Chen , Zhenhua Huang
{"title":"Data-free knowledge distillation via text-noise fusion and dynamic adversarial temperature","authors":"Deheng Zeng ,&nbsp;Zhengyang Wu ,&nbsp;Yunwen Chen ,&nbsp;Zhenhua Huang","doi":"10.1016/j.neunet.2025.108061","DOIUrl":null,"url":null,"abstract":"<div><div>Data-Free Knowledge Distillation (DFKD) have achieved significant breakthroughs, enabling the effective transfer of knowledge from teacher neural networks to student neural networks without reliance on original data. However, a significant challenge faced by existing methods that attempt to generate samples from random noise is that the noise lacks meaningful information, such as class-specific semantic information. Consequently, the absence of meaningful information makes it difficult for the generator to map this noise to the ground-truth data distribution, resulting in the generation of low-quality training samples. In addition, existing methods typically employ a fixed temperature for adversarial training of the generator, which limits the diversity in the difficulty of the synthesized data. In this paper, we propose Text-Noise Fusion and Dynamic Adversarial Temperature method (TNFDAT), a novel method that combines random noise with meaningful class-specific text embeddings (CSTE) as input and implements dynamic adjustment of the adversarial training temperature for the generator. In addition, we introduce an adaptive sample weighting strategy to enhance the effectiveness of knowledge distillation. CSTE is developed based on a pre-trained language model, and its significance lies in its ability to capture meaningful inter-class information, thereby enabling the generation of high-quality samples. Simultaneously, the dynamic adversarial temperature module effectively alleviates the issue of insufficient diversity in synthesized samples by precisely modulating the generator’s temperature during adversarial training, playing a key role in enhancing sample diversity. Through continuous and dynamic temperature adjustment of the generator in the adversarial training, thereby significantly improving the overall diversity of the synthesized samples. At the knowledge distillation stage, We determine the distillation weights of the synthesized samples based on the information entropy of the output from both teacher and student networks. By differentiating the contributions of different synthesized samples during the distillation process, we effectively enhance the generalization ability of the knowledge distillation framework and improve the robustness of the student network. Experiments demonstrate that our method outperforms the state-of-the-art methods across various benchmarks and pairs of teachers and students.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"193 ","pages":"Article 108061"},"PeriodicalIF":6.3000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025009414","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Data-Free Knowledge Distillation (DFKD) have achieved significant breakthroughs, enabling the effective transfer of knowledge from teacher neural networks to student neural networks without reliance on original data. However, a significant challenge faced by existing methods that attempt to generate samples from random noise is that the noise lacks meaningful information, such as class-specific semantic information. Consequently, the absence of meaningful information makes it difficult for the generator to map this noise to the ground-truth data distribution, resulting in the generation of low-quality training samples. In addition, existing methods typically employ a fixed temperature for adversarial training of the generator, which limits the diversity in the difficulty of the synthesized data. In this paper, we propose Text-Noise Fusion and Dynamic Adversarial Temperature method (TNFDAT), a novel method that combines random noise with meaningful class-specific text embeddings (CSTE) as input and implements dynamic adjustment of the adversarial training temperature for the generator. In addition, we introduce an adaptive sample weighting strategy to enhance the effectiveness of knowledge distillation. CSTE is developed based on a pre-trained language model, and its significance lies in its ability to capture meaningful inter-class information, thereby enabling the generation of high-quality samples. Simultaneously, the dynamic adversarial temperature module effectively alleviates the issue of insufficient diversity in synthesized samples by precisely modulating the generator’s temperature during adversarial training, playing a key role in enhancing sample diversity. Through continuous and dynamic temperature adjustment of the generator in the adversarial training, thereby significantly improving the overall diversity of the synthesized samples. At the knowledge distillation stage, We determine the distillation weights of the synthesized samples based on the information entropy of the output from both teacher and student networks. By differentiating the contributions of different synthesized samples during the distillation process, we effectively enhance the generalization ability of the knowledge distillation framework and improve the robustness of the student network. Experiments demonstrate that our method outperforms the state-of-the-art methods across various benchmarks and pairs of teachers and students.
基于文本噪声融合和动态对抗温度的无数据知识蒸馏
无数据知识蒸馏(DFKD)已经取得了重大突破,能够在不依赖原始数据的情况下有效地将知识从教师神经网络转移到学生神经网络。然而,试图从随机噪声中生成样本的现有方法面临的一个重大挑战是噪声缺乏有意义的信息,例如特定于类的语义信息。因此,缺乏有意义的信息使得生成器难以将这些噪声映射到真实数据分布,从而导致生成低质量的训练样本。此外,现有方法通常采用固定温度对生成器进行对抗训练,这限制了合成数据难度的多样性。在本文中,我们提出了文本噪声融合和动态对抗温度方法(TNFDAT),这是一种结合随机噪声和有意义的类特定文本嵌入(CSTE)作为输入并实现生成器对抗训练温度动态调整的新方法。此外,我们还引入了一种自适应的样本加权策略来提高知识蒸馏的有效性。CSTE是基于预训练的语言模型开发的,其意义在于能够捕获有意义的类间信息,从而能够生成高质量的样本。同时,动态对抗温度模块通过在对抗训练过程中精确调节发生器温度,有效缓解了合成样本多样性不足的问题,对增强样本多样性起到了关键作用。在对抗训练中通过对生成器进行连续动态的温度调节,从而显著提高了合成样本的整体多样性。在知识蒸馏阶段,我们根据师生网络输出的信息熵确定合成样本的蒸馏权值。通过区分不同合成样本在精馏过程中的贡献,有效增强了知识精馏框架的泛化能力,提高了学生网络的鲁棒性。实验表明,我们的方法在各种基准测试和对教师和学生的测试中都优于最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neural Networks
Neural Networks 工程技术-计算机:人工智能
CiteScore
13.90
自引率
7.70%
发文量
425
审稿时长
67 days
期刊介绍: Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信