Han Chen , Xuyang Teng , Jiajie Su , Chunhao Li , Chang Hu , Meng Han
{"title":"Teacher Probability Reconstruction based knowledge distillation within intelligent network compression","authors":"Han Chen , Xuyang Teng , Jiajie Su , Chunhao Li , Chang Hu , Meng Han","doi":"10.1016/j.ijin.2025.02.001","DOIUrl":null,"url":null,"abstract":"<div><div>In the optimization of intelligent network architecture, limited resources at each node, including edge computing devices, have posed challenges for deploying large models in performance-demanding scenarios. Knowledge distillation serves as a model compression method that extracts knowledge from a large-scale teacher model and transfers it to a more lightweight student model. Previous knowledge distillation methods mainly focus on the intermediate layers of the network. However, due to privacy protection regulations that limit data sharing and access as well as computational efficiency requirements in practical scenarios, feature-based distillation encounters challenges in practical applications. We start with logit-based distillation to address these issues, enabling students to learn more representative knowledge from the teacher’s output probability distribution. Due to the structural limitations of the teacher network such as insufficient depth or width, and potential issues in the training data like noise and imbalance, the output probability distribution contains many errors. Therefore, we propose a knowledge distillation method that improves the student by correcting errors in the teacher model. Nevertheless, teacher’s errors not only bring mistakes to students but also give students greater subjectivity, enabling them to break free from the limitations of the teacher. We also retain the teacher’s thinking to prevent students from becoming biased on the remaining (non-target) categories while correcting teacher errors for students. Extensive experiments demonstrate that our method achieves competitive performance on multiple benchmarks without extra parameters.</div></div>","PeriodicalId":100702,"journal":{"name":"International Journal of Intelligent Networks","volume":"6 ","pages":"Pages 47-56"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Networks","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666603025000028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the optimization of intelligent network architecture, limited resources at each node, including edge computing devices, have posed challenges for deploying large models in performance-demanding scenarios. Knowledge distillation serves as a model compression method that extracts knowledge from a large-scale teacher model and transfers it to a more lightweight student model. Previous knowledge distillation methods mainly focus on the intermediate layers of the network. However, due to privacy protection regulations that limit data sharing and access as well as computational efficiency requirements in practical scenarios, feature-based distillation encounters challenges in practical applications. We start with logit-based distillation to address these issues, enabling students to learn more representative knowledge from the teacher’s output probability distribution. Due to the structural limitations of the teacher network such as insufficient depth or width, and potential issues in the training data like noise and imbalance, the output probability distribution contains many errors. Therefore, we propose a knowledge distillation method that improves the student by correcting errors in the teacher model. Nevertheless, teacher’s errors not only bring mistakes to students but also give students greater subjectivity, enabling them to break free from the limitations of the teacher. We also retain the teacher’s thinking to prevent students from becoming biased on the remaining (non-target) categories while correcting teacher errors for students. Extensive experiments demonstrate that our method achieves competitive performance on multiple benchmarks without extra parameters.