Yunfei Zheng , Jibin Yang , Haijun Tao , Yong Wang , Lei Chen , Yang Wang , Tieyong Cao
{"title":"基于广义分集损失的自蒸馏显著目标检测","authors":"Yunfei Zheng , Jibin Yang , Haijun Tao , Yong Wang , Lei Chen , Yang Wang , Tieyong Cao","doi":"10.1016/j.patcog.2025.111804","DOIUrl":null,"url":null,"abstract":"<div><div>Classic knowledge distillation (KD) via the Kullback–Leibler loss can improve the performance of small deep classification models effectively, but they are hard to be applied into salient object detection (SOD) models due to the lack of necessary multi-dimension knowledge representations in the logit layer. In this paper, a generalized diversity (GD) loss, inspired by ensemble learning, is proposed to constrain the student and teacher models to hold low diversity. This process drives the student to mimic the teacher’s salient knowledge representations while enhancing the student’s generalization ability. Secondly, a salient self-distillation (SD) framework based on the shared backbone and the salient SD loss is proposed. In a shared backbone network, a lightweight student sub-network and a large parameter teacher sub-network are constructed, respectively, to synchronously achieve coarse but rapid feature extraction, and refined but slow feature extraction. The SD loss is utilized to transfer refined salient knowledge from the teacher sub-network to the student sub-network, so that the performance of the student sub-network is improved. Extensive experimental results on five benchmark datasets demonstrate that the proposed GD loss can achieve salient knowledge transfer and outperforms recent six KD methods, and the proposed student network outperforms recent eleven SOD networks in performance and efficiency.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"168 ","pages":"Article 111804"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Self-distillation salient object detection via generalized diversity loss\",\"authors\":\"Yunfei Zheng , Jibin Yang , Haijun Tao , Yong Wang , Lei Chen , Yang Wang , Tieyong Cao\",\"doi\":\"10.1016/j.patcog.2025.111804\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Classic knowledge distillation (KD) via the Kullback–Leibler loss can improve the performance of small deep classification models effectively, but they are hard to be applied into salient object detection (SOD) models due to the lack of necessary multi-dimension knowledge representations in the logit layer. In this paper, a generalized diversity (GD) loss, inspired by ensemble learning, is proposed to constrain the student and teacher models to hold low diversity. This process drives the student to mimic the teacher’s salient knowledge representations while enhancing the student’s generalization ability. Secondly, a salient self-distillation (SD) framework based on the shared backbone and the salient SD loss is proposed. In a shared backbone network, a lightweight student sub-network and a large parameter teacher sub-network are constructed, respectively, to synchronously achieve coarse but rapid feature extraction, and refined but slow feature extraction. The SD loss is utilized to transfer refined salient knowledge from the teacher sub-network to the student sub-network, so that the performance of the student sub-network is improved. Extensive experimental results on five benchmark datasets demonstrate that the proposed GD loss can achieve salient knowledge transfer and outperforms recent six KD methods, and the proposed student network outperforms recent eleven SOD networks in performance and efficiency.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"168 \",\"pages\":\"Article 111804\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-05-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325004649\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325004649","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Self-distillation salient object detection via generalized diversity loss
Classic knowledge distillation (KD) via the Kullback–Leibler loss can improve the performance of small deep classification models effectively, but they are hard to be applied into salient object detection (SOD) models due to the lack of necessary multi-dimension knowledge representations in the logit layer. In this paper, a generalized diversity (GD) loss, inspired by ensemble learning, is proposed to constrain the student and teacher models to hold low diversity. This process drives the student to mimic the teacher’s salient knowledge representations while enhancing the student’s generalization ability. Secondly, a salient self-distillation (SD) framework based on the shared backbone and the salient SD loss is proposed. In a shared backbone network, a lightweight student sub-network and a large parameter teacher sub-network are constructed, respectively, to synchronously achieve coarse but rapid feature extraction, and refined but slow feature extraction. The SD loss is utilized to transfer refined salient knowledge from the teacher sub-network to the student sub-network, so that the performance of the student sub-network is improved. Extensive experimental results on five benchmark datasets demonstrate that the proposed GD loss can achieve salient knowledge transfer and outperforms recent six KD methods, and the proposed student network outperforms recent eleven SOD networks in performance and efficiency.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.