SAFENet: Semantic-Aware Feature Enhancement Network for unsupervised cross-domain road scene segmentation

IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Dexin Ren , Minxian Li , Shidong Wang , Mingwu Ren , Haofeng Zhang
{"title":"SAFENet: Semantic-Aware Feature Enhancement Network for unsupervised cross-domain road scene segmentation","authors":"Dexin Ren ,&nbsp;Minxian Li ,&nbsp;Shidong Wang ,&nbsp;Mingwu Ren ,&nbsp;Haofeng Zhang","doi":"10.1016/j.imavis.2024.105318","DOIUrl":null,"url":null,"abstract":"<div><div>Unsupervised cross-domain road scene segmentation has attracted substantial interest because of its capability to perform segmentation on new and unlabeled domains, thereby reducing the dependence on expensive manual annotations. This is achieved by leveraging networks trained on labeled source domains to classify images on unlabeled target domains. Conventional techniques usually use adversarial networks to align inputs from the source and the target in either of their domains. However, these approaches often fall short in effectively integrating information from both domains due to Alignment in each space usually leads to bias problems during feature learning. To overcome these limitations and enhance cross-domain interaction while mitigating overfitting to the source domain, we introduce a novel framework called Semantic-Aware Feature Enhancement Network (SAFENet) for Unsupervised Cross-domain Road Scene Segmentation. SAFENet incorporates the Semantic-Aware Enhancement (SAE) module to amplify the importance of class information in segmentation tasks and uses the semantic space as a new domain to guide the alignment of the source and target domains. Additionally, we integrate Adaptive Instance Normalization with Momentum (AdaIN-M) techniques, which convert the source domain image style to the target domain image style, thereby reducing the adverse effects of source domain overfitting on target domain segmentation performance. Moreover, SAFENet employs a Knowledge Transfer (KT) module to optimize network architecture, enhancing computational efficiency during testing while maintaining the robust inference capabilities developed during training. To further improve the segmentation performance, we further employ Curriculum Learning, a self-training mechanism that uses pseudo-labels derived from the target domain to iteratively refine the network. Comprehensive experiments on three well-known datasets, “Synthia<span><math><mo>→</mo></math></span>Cityscapes” and “GTA5<span><math><mo>→</mo></math></span>Cityscapes”, demonstrate the superior performance of our method. In-depth examinations and ablation studies verify the efficacy of each module within the proposed method.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105318"},"PeriodicalIF":4.2000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004232","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Unsupervised cross-domain road scene segmentation has attracted substantial interest because of its capability to perform segmentation on new and unlabeled domains, thereby reducing the dependence on expensive manual annotations. This is achieved by leveraging networks trained on labeled source domains to classify images on unlabeled target domains. Conventional techniques usually use adversarial networks to align inputs from the source and the target in either of their domains. However, these approaches often fall short in effectively integrating information from both domains due to Alignment in each space usually leads to bias problems during feature learning. To overcome these limitations and enhance cross-domain interaction while mitigating overfitting to the source domain, we introduce a novel framework called Semantic-Aware Feature Enhancement Network (SAFENet) for Unsupervised Cross-domain Road Scene Segmentation. SAFENet incorporates the Semantic-Aware Enhancement (SAE) module to amplify the importance of class information in segmentation tasks and uses the semantic space as a new domain to guide the alignment of the source and target domains. Additionally, we integrate Adaptive Instance Normalization with Momentum (AdaIN-M) techniques, which convert the source domain image style to the target domain image style, thereby reducing the adverse effects of source domain overfitting on target domain segmentation performance. Moreover, SAFENet employs a Knowledge Transfer (KT) module to optimize network architecture, enhancing computational efficiency during testing while maintaining the robust inference capabilities developed during training. To further improve the segmentation performance, we further employ Curriculum Learning, a self-training mechanism that uses pseudo-labels derived from the target domain to iteratively refine the network. Comprehensive experiments on three well-known datasets, “SynthiaCityscapes” and “GTA5Cityscapes”, demonstrate the superior performance of our method. In-depth examinations and ablation studies verify the efficacy of each module within the proposed method.
SAFENet:用于无监督跨域道路场景分割的语义感知特征增强网络
无监督跨域道路场景分割因其能够在新的和未标记的域上执行分割,从而减少对昂贵的人工标注的依赖而引起了广泛关注。要实现这一点,就必须利用在已标注源域上训练的网络来对未标注目标域上的图像进行分类。传统技术通常使用对抗网络对来自源域和目标域的输入进行对齐。然而,这些方法往往无法有效整合两个域的信息,因为在特征学习过程中,每个空间的对齐通常会导致偏差问题。为了克服这些局限性,在增强跨域交互的同时减少源域的过拟合,我们引入了一种名为语义感知特征增强网络(SAFENet)的新型框架,用于无监督跨域道路场景分割。SAFENet 融合了语义感知增强(SAE)模块,以提高类别信息在分割任务中的重要性,并将语义空间作为一个新域来指导源域和目标域的对齐。此外,我们还集成了带动量的自适应实例归一化(AdaIN-M)技术,将源域图像风格转换为目标域图像风格,从而降低源域过拟合对目标域分割性能的不利影响。此外,SAFENet 还采用了知识转移(KT)模块来优化网络结构,提高了测试过程中的计算效率,同时保持了训练过程中形成的强大推理能力。为了进一步提高分割性能,我们还进一步采用了课程学习(Curriculum Learning),这是一种自我训练机制,它使用从目标领域获得的伪标签来迭代完善网络。在 "Synthia→Cityscapes "和 "GTA5→Cityscapes "这三个著名的数据集上进行的综合实验证明了我们的方法具有卓越的性能。深入的检查和消融研究验证了建议方法中每个模块的功效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Image and Vision Computing
Image and Vision Computing 工程技术-工程:电子与电气
CiteScore
8.50
自引率
8.50%
发文量
143
审稿时长
7.8 months
期刊介绍: Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信