{"title":"Knowledge tailoring: Bridging the teacher-student gap in semantic segmentation","authors":"Seokhwa Cheung , Seungbeom Woo , Taehoon Kim , Wonjun Hwang","doi":"10.1016/j.patcog.2025.112399","DOIUrl":null,"url":null,"abstract":"<div><div>Knowledge distillation transfers knowledge from a high-capacity teacher network to a compact student network, but a large capacity gap often limits the student’s ability to fully benefit from the teacher’s guidance. In semantic segmentation, another major challenge is the difficulty in predicting accurate object boundaries, as even strong teacher models can produce ambiguous or imprecise outputs. To address both challenges, we present Knowledge Tailoring, a novel distillation framework that adapts the teacher’s knowledge to better match the student’s representational capacity and learning dynamics. Much like a tailor adjusts an oversized suit to fit the wearer’s shape, our method reshapes the teacher’s abundant but misaligned knowledge into a form more suitable for the student. KT introduces feature tailoring, which restructures intermediate features based on channel-wise correlation to narrow the representation gap, and logit tailoring, which improves boundary prediction by refining class-specific logits. The tailoring strategy evolves throughout training, offering guidance that aligns with the student’s progress. Experiments on Cityscapes, Pascal VOC, and ADE20K confirm that KT consistently enhances performance across a variety of architectures including DeepLabV3, PSPNet, and SegFormer. Our code is available for <span><span>https://github.com/seok-hwa/KT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112399"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S003132032501060X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Knowledge distillation transfers knowledge from a high-capacity teacher network to a compact student network, but a large capacity gap often limits the student’s ability to fully benefit from the teacher’s guidance. In semantic segmentation, another major challenge is the difficulty in predicting accurate object boundaries, as even strong teacher models can produce ambiguous or imprecise outputs. To address both challenges, we present Knowledge Tailoring, a novel distillation framework that adapts the teacher’s knowledge to better match the student’s representational capacity and learning dynamics. Much like a tailor adjusts an oversized suit to fit the wearer’s shape, our method reshapes the teacher’s abundant but misaligned knowledge into a form more suitable for the student. KT introduces feature tailoring, which restructures intermediate features based on channel-wise correlation to narrow the representation gap, and logit tailoring, which improves boundary prediction by refining class-specific logits. The tailoring strategy evolves throughout training, offering guidance that aligns with the student’s progress. Experiments on Cityscapes, Pascal VOC, and ADE20K confirm that KT consistently enhances performance across a variety of architectures including DeepLabV3, PSPNet, and SegFormer. Our code is available for https://github.com/seok-hwa/KT.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.