VFT：一种基于特征分布感知知识蒸馏的轻量级卷积神经网络通用微调方案

IF 8 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence Pub Date : 2025-07-04 DOI:10.1016/j.engappai.2025.111597

Hyeonseok Hong, Hyun Kim

{"title":"VFT：一种基于特征分布感知知识蒸馏的轻量级卷积神经网络通用微调方案","authors":"Hyeonseok Hong, Hyun Kim","doi":"10.1016/j.engappai.2025.111597","DOIUrl":null,"url":null,"abstract":"<div><div>Various network compression techniques, such as pruning and quantization, are being actively researched in order to lighten convolutional neural networks (CNNs), which have increasingly deep and complex structures accompanied by the achievement of higher accuracy. Since most of these network compression techniques cause a decrease in accuracy, fine-tuning is essential to recover the performance of lightweight models; however, fine-tuning has received limited research attention compared to numerous compression techniques, and thus, performance recovery by fine-tuning has significant room for improvement. In this paper, we analyze the shortcomings of existing fine-tuning methods in terms of loss landscape and introduce a knowledge distillation (KD)-based fine-tuning approach that solves these problems. In particular, to overcome the limitation that KD can be adversely affected by the capacity difference between the teacher and student models or the defined knowledge to be transferred, we propose a feature distribution-aware knowledge distillation (FDKD) method, which defines appropriate supervision in the form of feature distribution to transfer the semantic information from teacher models. Moreover, we also propose a layer-wise FDKD method by exploiting the uniqueness of the lightweight model that the baseline (<em>i.e.</em>, teacher) and compressed models (<em>i.e.</em>, student) have the same architecture. Experiments on classification tasks demonstrate the superiority of the proposed method over existing fine-tuning methods, achieving up to 1.99% and 3.83% of accuracy improvement for pruned and quantized models, respectively. The source code for this implementation is available at [<span><span>https://github.com/IDSL-SeoulTech/VFT</span><svg><path></path></svg></span>].</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"159 ","pages":"Article 111597"},"PeriodicalIF":8.0000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VFT: A versatile fine-tuning scheme based on feature distribution-aware knowledge distillation for lightweight convolutional neural networks\",\"authors\":\"Hyeonseok Hong, Hyun Kim\",\"doi\":\"10.1016/j.engappai.2025.111597\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Various network compression techniques, such as pruning and quantization, are being actively researched in order to lighten convolutional neural networks (CNNs), which have increasingly deep and complex structures accompanied by the achievement of higher accuracy. Since most of these network compression techniques cause a decrease in accuracy, fine-tuning is essential to recover the performance of lightweight models; however, fine-tuning has received limited research attention compared to numerous compression techniques, and thus, performance recovery by fine-tuning has significant room for improvement. In this paper, we analyze the shortcomings of existing fine-tuning methods in terms of loss landscape and introduce a knowledge distillation (KD)-based fine-tuning approach that solves these problems. In particular, to overcome the limitation that KD can be adversely affected by the capacity difference between the teacher and student models or the defined knowledge to be transferred, we propose a feature distribution-aware knowledge distillation (FDKD) method, which defines appropriate supervision in the form of feature distribution to transfer the semantic information from teacher models. Moreover, we also propose a layer-wise FDKD method by exploiting the uniqueness of the lightweight model that the baseline (<em>i.e.</em>, teacher) and compressed models (<em>i.e.</em>, student) have the same architecture. Experiments on classification tasks demonstrate the superiority of the proposed method over existing fine-tuning methods, achieving up to 1.99% and 3.83% of accuracy improvement for pruned and quantized models, respectively. The source code for this implementation is available at [<span><span>https://github.com/IDSL-SeoulTech/VFT</span><svg><path></path></svg></span>].</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"159 \",\"pages\":\"Article 111597\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197625015994\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625015994","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

随着卷积神经网络结构越来越深、越来越复杂，精度也越来越高，为了减轻卷积神经网络的重量，人们正在积极研究各种网络压缩技术，如剪枝和量化等。由于大多数网络压缩技术会导致准确性下降，因此微调对于恢复轻量级模型的性能至关重要；然而，与许多压缩技术相比，微调得到的研究关注有限，因此，通过微调恢复性能有很大的改进空间。在本文中，我们分析了现有的微调方法在损失景观方面的不足，并引入了一种基于知识蒸馏（KD）的微调方法来解决这些问题。特别是，为了克服KD可能受到教师和学生模型之间的能力差异或待转移的定义知识的不利影响的限制，我们提出了一种特征分布感知的知识蒸馏（FDKD）方法，该方法以特征分布的形式定义适当的监督，以从教师模型中转移语义信息。此外，我们还提出了一种分层FDKD方法，该方法利用了轻量级模型的唯一性，即基线（即教师）和压缩模型（即学生）具有相同的体系结构。在分类任务上的实验证明了该方法相对于现有的微调方法的优越性，对修剪模型和量化模型的准确率分别提高了1.99%和3.83%。此实现的源代码可从[https://github.com/IDSL-SeoulTech/VFT]]获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

VFT: A versatile fine-tuning scheme based on feature distribution-aware knowledge distillation for lightweight convolutional neural networks

Various network compression techniques, such as pruning and quantization, are being actively researched in order to lighten convolutional neural networks (CNNs), which have increasingly deep and complex structures accompanied by the achievement of higher accuracy. Since most of these network compression techniques cause a decrease in accuracy, fine-tuning is essential to recover the performance of lightweight models; however, fine-tuning has received limited research attention compared to numerous compression techniques, and thus, performance recovery by fine-tuning has significant room for improvement. In this paper, we analyze the shortcomings of existing fine-tuning methods in terms of loss landscape and introduce a knowledge distillation (KD)-based fine-tuning approach that solves these problems. In particular, to overcome the limitation that KD can be adversely affected by the capacity difference between the teacher and student models or the defined knowledge to be transferred, we propose a feature distribution-aware knowledge distillation (FDKD) method, which defines appropriate supervision in the form of feature distribution to transfer the semantic information from teacher models. Moreover, we also propose a layer-wise FDKD method by exploiting the uniqueness of the lightweight model that the baseline (i.e., teacher) and compressed models (i.e., student) have the same architecture. Experiments on classification tasks demonstrate the superiority of the proposed method over existing fine-tuning methods, achieving up to 1.99% and 3.83% of accuracy improvement for pruned and quantized models, respectively. The source code for this implementation is available at [https://github.com/IDSL-SeoulTech/VFT].

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.