Dual-branch teacher-student with noise-tolerant learning for domain adaptive nighttime segmentation

IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Ruiying Chen , Yunan Liu , Yuming Bo , Mingyu Lu
{"title":"Dual-branch teacher-student with noise-tolerant learning for domain adaptive nighttime segmentation","authors":"Ruiying Chen ,&nbsp;Yunan Liu ,&nbsp;Yuming Bo ,&nbsp;Mingyu Lu","doi":"10.1016/j.imavis.2024.105211","DOIUrl":null,"url":null,"abstract":"<div><p>While significant progress has been achieved in the field of image semantic segmentation, the majority of research has been primarily concentrated on daytime scenes. Semantic segmentation of nighttime images is equally critical for autonomous driving; however, this task presents greater challenges due to inadequate lighting and difficulties associated with obtaining accurate manual annotations. In this paper, we introduce a novel method called the Dual-Branch Teacher-Student (DBTS) framework for unsupervised nighttime semantic segmentation. Our approach combines domain alignment and knowledge distillation in a mutually reinforcing manner. Firstly, we employ a photometric alignment module to dynamically generate target-like latent images, bridging the appearance gap between the source domain (daytime) and the target domain (nighttime). Secondly, we establish a dual-branch framework, where each branch enhances collaboration between the teacher and student networks. The student network utilizes adversarial learning to align the target domain with another domain (i.e., source or latent domain), while the teacher network generates reliable pseudo-labels by distilling knowledge from the latent domain. Furthermore, recognizing the potential noise present in pseudo-labels, we propose a noise-tolerant learning method to mitigate the risks associated with overreliance on pseudo-labels during domain adaptation. When evaluated on benchmark datasets, the proposed DBTS achieves state-of-the-art performance. Specifically, DBTS, using different backbones, outperforms established baseline models by approximately 25% in mIoU on the Zurich dataset and by over 26% in mIoU on the ACDC dataset, demonstrating the effectiveness of our method in addressing the challenges of domain-adaptive nighttime segmentation.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105211"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624003160","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

While significant progress has been achieved in the field of image semantic segmentation, the majority of research has been primarily concentrated on daytime scenes. Semantic segmentation of nighttime images is equally critical for autonomous driving; however, this task presents greater challenges due to inadequate lighting and difficulties associated with obtaining accurate manual annotations. In this paper, we introduce a novel method called the Dual-Branch Teacher-Student (DBTS) framework for unsupervised nighttime semantic segmentation. Our approach combines domain alignment and knowledge distillation in a mutually reinforcing manner. Firstly, we employ a photometric alignment module to dynamically generate target-like latent images, bridging the appearance gap between the source domain (daytime) and the target domain (nighttime). Secondly, we establish a dual-branch framework, where each branch enhances collaboration between the teacher and student networks. The student network utilizes adversarial learning to align the target domain with another domain (i.e., source or latent domain), while the teacher network generates reliable pseudo-labels by distilling knowledge from the latent domain. Furthermore, recognizing the potential noise present in pseudo-labels, we propose a noise-tolerant learning method to mitigate the risks associated with overreliance on pseudo-labels during domain adaptation. When evaluated on benchmark datasets, the proposed DBTS achieves state-of-the-art performance. Specifically, DBTS, using different backbones, outperforms established baseline models by approximately 25% in mIoU on the Zurich dataset and by over 26% in mIoU on the ACDC dataset, demonstrating the effectiveness of our method in addressing the challenges of domain-adaptive nighttime segmentation.

用于域自适应夜间分段的具有容噪学习功能的双分支师生系统
虽然图像语义分割领域取得了重大进展,但大部分研究主要集中在白天场景。夜间图像的语义分割对于自动驾驶同样至关重要;然而,由于照明不足以及难以获得准确的人工注释,这项任务面临着更大的挑战。在本文中,我们介绍了一种用于无监督夜间语义分割的新型方法,即双支师生(DBTS)框架。我们的方法以相辅相成的方式将领域对齐和知识提炼结合在一起。首先,我们采用光度对齐模块动态生成类似目标的潜在图像,弥合源域(白天)和目标域(夜间)之间的外观差距。其次,我们建立了一个双分支框架,每个分支都加强了教师网络和学生网络之间的协作。学生网络利用对抗学习将目标域与另一个域(即源域或潜域)对齐,而教师网络则通过从潜域中提炼知识生成可靠的伪标签。此外,考虑到伪标签中可能存在的噪声,我们提出了一种噪声容忍学习方法,以降低在域适应过程中过度依赖伪标签所带来的风险。在基准数据集上进行评估时,所提出的 DBTS 达到了最先进的性能。具体来说,使用不同骨干网的 DBTS 在苏黎世数据集上的 mIoU 优于已建立的基线模型约 25%,在 ACDC 数据集上的 mIoU 优于已建立的基线模型 26%,这证明了我们的方法在应对域自适应夜间分割挑战方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Image and Vision Computing
Image and Vision Computing 工程技术-工程:电子与电气
CiteScore
8.50
自引率
8.50%
发文量
143
审稿时长
7.8 months
期刊介绍: Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信