Han Chen , Xuyang Teng , Jiajie Su , Chunhao Li , Chang Hu , Meng Han
{"title":"基于教师概率重构的智能网络压缩知识蒸馏","authors":"Han Chen , Xuyang Teng , Jiajie Su , Chunhao Li , Chang Hu , Meng Han","doi":"10.1016/j.ijin.2025.02.001","DOIUrl":null,"url":null,"abstract":"<div><div>In the optimization of intelligent network architecture, limited resources at each node, including edge computing devices, have posed challenges for deploying large models in performance-demanding scenarios. Knowledge distillation serves as a model compression method that extracts knowledge from a large-scale teacher model and transfers it to a more lightweight student model. Previous knowledge distillation methods mainly focus on the intermediate layers of the network. However, due to privacy protection regulations that limit data sharing and access as well as computational efficiency requirements in practical scenarios, feature-based distillation encounters challenges in practical applications. We start with logit-based distillation to address these issues, enabling students to learn more representative knowledge from the teacher’s output probability distribution. Due to the structural limitations of the teacher network such as insufficient depth or width, and potential issues in the training data like noise and imbalance, the output probability distribution contains many errors. Therefore, we propose a knowledge distillation method that improves the student by correcting errors in the teacher model. Nevertheless, teacher’s errors not only bring mistakes to students but also give students greater subjectivity, enabling them to break free from the limitations of the teacher. We also retain the teacher’s thinking to prevent students from becoming biased on the remaining (non-target) categories while correcting teacher errors for students. Extensive experiments demonstrate that our method achieves competitive performance on multiple benchmarks without extra parameters.</div></div>","PeriodicalId":100702,"journal":{"name":"International Journal of Intelligent Networks","volume":"6 ","pages":"Pages 47-56"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Teacher Probability Reconstruction based knowledge distillation within intelligent network compression\",\"authors\":\"Han Chen , Xuyang Teng , Jiajie Su , Chunhao Li , Chang Hu , Meng Han\",\"doi\":\"10.1016/j.ijin.2025.02.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the optimization of intelligent network architecture, limited resources at each node, including edge computing devices, have posed challenges for deploying large models in performance-demanding scenarios. Knowledge distillation serves as a model compression method that extracts knowledge from a large-scale teacher model and transfers it to a more lightweight student model. Previous knowledge distillation methods mainly focus on the intermediate layers of the network. However, due to privacy protection regulations that limit data sharing and access as well as computational efficiency requirements in practical scenarios, feature-based distillation encounters challenges in practical applications. We start with logit-based distillation to address these issues, enabling students to learn more representative knowledge from the teacher’s output probability distribution. Due to the structural limitations of the teacher network such as insufficient depth or width, and potential issues in the training data like noise and imbalance, the output probability distribution contains many errors. Therefore, we propose a knowledge distillation method that improves the student by correcting errors in the teacher model. Nevertheless, teacher’s errors not only bring mistakes to students but also give students greater subjectivity, enabling them to break free from the limitations of the teacher. We also retain the teacher’s thinking to prevent students from becoming biased on the remaining (non-target) categories while correcting teacher errors for students. Extensive experiments demonstrate that our method achieves competitive performance on multiple benchmarks without extra parameters.</div></div>\",\"PeriodicalId\":100702,\"journal\":{\"name\":\"International Journal of Intelligent Networks\",\"volume\":\"6 \",\"pages\":\"Pages 47-56\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Intelligent Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666603025000028\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Networks","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666603025000028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Teacher Probability Reconstruction based knowledge distillation within intelligent network compression
In the optimization of intelligent network architecture, limited resources at each node, including edge computing devices, have posed challenges for deploying large models in performance-demanding scenarios. Knowledge distillation serves as a model compression method that extracts knowledge from a large-scale teacher model and transfers it to a more lightweight student model. Previous knowledge distillation methods mainly focus on the intermediate layers of the network. However, due to privacy protection regulations that limit data sharing and access as well as computational efficiency requirements in practical scenarios, feature-based distillation encounters challenges in practical applications. We start with logit-based distillation to address these issues, enabling students to learn more representative knowledge from the teacher’s output probability distribution. Due to the structural limitations of the teacher network such as insufficient depth or width, and potential issues in the training data like noise and imbalance, the output probability distribution contains many errors. Therefore, we propose a knowledge distillation method that improves the student by correcting errors in the teacher model. Nevertheless, teacher’s errors not only bring mistakes to students but also give students greater subjectivity, enabling them to break free from the limitations of the teacher. We also retain the teacher’s thinking to prevent students from becoming biased on the remaining (non-target) categories while correcting teacher errors for students. Extensive experiments demonstrate that our method achieves competitive performance on multiple benchmarks without extra parameters.