语义分割的分层关联和注意差异蒸馏

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-09-13 DOI:10.1016/j.patcog.2025.112438

Jianping Gou , Kaijie Chen , Cheng Chen , Weihua Ou , Xin Luo , Zhang Yi

{"title":"语义分割的分层关联和注意差异蒸馏","authors":"Jianping Gou , Kaijie Chen , Cheng Chen , Weihua Ou , Xin Luo , Zhang Yi","doi":"10.1016/j.patcog.2025.112438","DOIUrl":null,"url":null,"abstract":"<div><div>Knowledge distillation (KD) has recently garnered increased attention in segmentation tasks due to its effective balance between accuracy and computational efficiency. Nonetheless, existing methods mainly rely on structured knowledge from a single layer, overlooking the valuable discrepant knowledge that captures the diversity and distinctiveness of features across various layers, which is essential for the KD process. We present Layer-wise Correlation and Attention Discrepancy Distillation (LCADD) to tackle this issue, training compact and accurate semantic segmentation networks by considering layer-wise discrepancy knowledge. Specifically, we employ two distillation schemes: (i) correlation discrepancy distillation, which constructs a pixel-wise correlation discrepancy matrix across various layers to seize more detailed spatial dependencies, and (ii) attention discrepancy self-distillation, which aims to guide the shallower layers of the student network to emulate the attention discrepancy maps of the deeper layers, facilitating self-learning of attention discrepancy knowledge within the student network. Each proposed method is designed to work collaboratively in learning discrepancy knowledge, allowing the student network to better imitate the teacher from the perspective of layer-wise discrepancy. Our method has demonstrated superior performance on various semantic segmentation datasets, including Cityscapes, Pascal VOC 2012, and CamVid, compared to the latest knowledge distillation techniques, thereby validating its effectiveness.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112438"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Layer-wise correlation and attention discrepancy distillation for semantic segmentation\",\"authors\":\"Jianping Gou , Kaijie Chen , Cheng Chen , Weihua Ou , Xin Luo , Zhang Yi\",\"doi\":\"10.1016/j.patcog.2025.112438\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Knowledge distillation (KD) has recently garnered increased attention in segmentation tasks due to its effective balance between accuracy and computational efficiency. Nonetheless, existing methods mainly rely on structured knowledge from a single layer, overlooking the valuable discrepant knowledge that captures the diversity and distinctiveness of features across various layers, which is essential for the KD process. We present Layer-wise Correlation and Attention Discrepancy Distillation (LCADD) to tackle this issue, training compact and accurate semantic segmentation networks by considering layer-wise discrepancy knowledge. Specifically, we employ two distillation schemes: (i) correlation discrepancy distillation, which constructs a pixel-wise correlation discrepancy matrix across various layers to seize more detailed spatial dependencies, and (ii) attention discrepancy self-distillation, which aims to guide the shallower layers of the student network to emulate the attention discrepancy maps of the deeper layers, facilitating self-learning of attention discrepancy knowledge within the student network. Each proposed method is designed to work collaboratively in learning discrepancy knowledge, allowing the student network to better imitate the teacher from the perspective of layer-wise discrepancy. Our method has demonstrated superior performance on various semantic segmentation datasets, including Cityscapes, Pascal VOC 2012, and CamVid, compared to the latest knowledge distillation techniques, thereby validating its effectiveness.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"172 \",\"pages\":\"Article 112438\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325010994\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325010994","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

知识精馏（Knowledge distillation， KD）由于其在准确率和计算效率之间的有效平衡，近年来在分割任务中受到越来越多的关注。尽管如此，现有的方法主要依赖于来自单层的结构化知识，忽略了捕捉不同层之间特征的多样性和独特性的有价值的差异知识，这对KD过程至关重要。为了解决这一问题，我们提出了分层相关和注意差异蒸馏（LCADD），通过考虑分层差异知识来训练紧凑和准确的语义分割网络。具体而言，我们采用了两种蒸馏方案：(i)相关差异蒸馏，构建跨各层的逐像素相关差异矩阵，以获取更详细的空间依赖性；（ii）注意差异自蒸馏，旨在引导学生网络的较浅层模拟较深层的注意差异图，促进学生网络中注意差异知识的自我学习。每种提出的方法都旨在协同学习差异知识，使学生网络能够从分层差异的角度更好地模仿教师。与最新的知识蒸馏技术相比，我们的方法在各种语义分割数据集（包括cityscape、Pascal VOC 2012和CamVid）上表现出了优越的性能，从而验证了其有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Layer-wise correlation and attention discrepancy distillation for semantic segmentation

Knowledge distillation (KD) has recently garnered increased attention in segmentation tasks due to its effective balance between accuracy and computational efficiency. Nonetheless, existing methods mainly rely on structured knowledge from a single layer, overlooking the valuable discrepant knowledge that captures the diversity and distinctiveness of features across various layers, which is essential for the KD process. We present Layer-wise Correlation and Attention Discrepancy Distillation (LCADD) to tackle this issue, training compact and accurate semantic segmentation networks by considering layer-wise discrepancy knowledge. Specifically, we employ two distillation schemes: (i) correlation discrepancy distillation, which constructs a pixel-wise correlation discrepancy matrix across various layers to seize more detailed spatial dependencies, and (ii) attention discrepancy self-distillation, which aims to guide the shallower layers of the student network to emulate the attention discrepancy maps of the deeper layers, facilitating self-learning of attention discrepancy knowledge within the student network. Each proposed method is designed to work collaboratively in learning discrepancy knowledge, allowing the student network to better imitate the teacher from the perspective of layer-wise discrepancy. Our method has demonstrated superior performance on various semantic segmentation datasets, including Cityscapes, Pascal VOC 2012, and CamVid, compared to the latest knowledge distillation techniques, thereby validating its effectiveness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.