条件不变语义分割

IF 18.6

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-01-14 DOI:10.1109/TPAMI.2025.3529350

Christos Sakaridis;David Bruggemann;Fisher Yu;Luc Van Gool

{"title":"条件不变语义分割","authors":"Christos Sakaridis;David Bruggemann;Fisher Yu;Luc Van Gool","doi":"10.1109/TPAMI.2025.3529350","DOIUrl":null,"url":null,"abstract":"Adaptation of semantic segmentation networks to different visual conditions is vital for robust perception in autonomous cars and robots. However, previous work has shown that most feature-level adaptation methods, which employ adversarial training and are validated on synthetic-to-real adaptation, provide marginal gains in condition-level adaptation, being outperformed by simple pixel-level adaptation via stylization. Motivated by these findings, we propose to leverage stylization in performing feature-level adaptation by aligning the internal network features extracted by the encoder of the network from the original and the stylized view of each input image with a novel feature invariance loss. In this way, we encourage the encoder to extract features that are already invariant to the style of the input, allowing the decoder to focus on parsing these features and not on further abstracting from the specific style of the input. We implement our method, named Condition-Invariant Semantic Segmentation (CISS), on the current state-of-the-art domain adaptation architecture and achieve outstanding results on condition-level adaptation. In particular, CISS sets the new state of the art in the popular daytime-to-nighttime Cityscapes <inline-formula><tex-math>$\\to$</tex-math></inline-formula> Dark Zurich benchmark. Furthermore, our method achieves the second-best performance on the normal-to-adverse Cityscapes <inline-formula><tex-math>$\\to$</tex-math></inline-formula> ACDC benchmark. CISS is shown to generalize well to domains unseen during training, such as BDD100K-night and ACDC-night.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"3111-3125"},"PeriodicalIF":18.6000,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Condition-Invariant Semantic Segmentation\",\"authors\":\"Christos Sakaridis;David Bruggemann;Fisher Yu;Luc Van Gool\",\"doi\":\"10.1109/TPAMI.2025.3529350\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Adaptation of semantic segmentation networks to different visual conditions is vital for robust perception in autonomous cars and robots. However, previous work has shown that most feature-level adaptation methods, which employ adversarial training and are validated on synthetic-to-real adaptation, provide marginal gains in condition-level adaptation, being outperformed by simple pixel-level adaptation via stylization. Motivated by these findings, we propose to leverage stylization in performing feature-level adaptation by aligning the internal network features extracted by the encoder of the network from the original and the stylized view of each input image with a novel feature invariance loss. In this way, we encourage the encoder to extract features that are already invariant to the style of the input, allowing the decoder to focus on parsing these features and not on further abstracting from the specific style of the input. We implement our method, named Condition-Invariant Semantic Segmentation (CISS), on the current state-of-the-art domain adaptation architecture and achieve outstanding results on condition-level adaptation. In particular, CISS sets the new state of the art in the popular daytime-to-nighttime Cityscapes <inline-formula><tex-math>$\\\\to$</tex-math></inline-formula> Dark Zurich benchmark. Furthermore, our method achieves the second-best performance on the normal-to-adverse Cityscapes <inline-formula><tex-math>$\\\\to$</tex-math></inline-formula> ACDC benchmark. CISS is shown to generalize well to domains unseen during training, such as BDD100K-night and ACDC-night.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"47 4\",\"pages\":\"3111-3125\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-01-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10840277/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10840277/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

语义分割网络对不同视觉条件的适应对于自动驾驶汽车和机器人的鲁棒感知至关重要。然而，先前的研究表明，大多数特征级自适应方法（采用对抗训练并在合成到真实的自适应中得到验证）在条件级自适应中提供了边际收益，通过风格化的简单像素级自适应优于条件级自适应。受这些发现的启发，我们建议利用风格化来执行特征级自适应，方法是将网络编码器从每个输入图像的原始和风格化视图中提取的内部网络特征与新的特征不变性损失对齐。通过这种方式，我们鼓励编码器提取对输入样式已经不变的特征，允许解码器专注于解析这些特征，而不是进一步从输入的特定样式中抽象。我们在当前最先进的领域自适应架构上实现了我们的方法，即条件不变语义分割（CISS），并在条件级自适应上取得了突出的结果。特别是，CISS在受欢迎的日间到夜间城市景观$\到$ Dark Zurich基准中设定了新的艺术状态。此外，我们的方法在正常到不利的城市景观$ $到$ ACDC基准上实现了第二好的性能。CISS可以很好地泛化到训练中未见的域，如BDD100K-night和ACDC-night。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Condition-Invariant Semantic Segmentation

Adaptation of semantic segmentation networks to different visual conditions is vital for robust perception in autonomous cars and robots. However, previous work has shown that most feature-level adaptation methods, which employ adversarial training and are validated on synthetic-to-real adaptation, provide marginal gains in condition-level adaptation, being outperformed by simple pixel-level adaptation via stylization. Motivated by these findings, we propose to leverage stylization in performing feature-level adaptation by aligning the internal network features extracted by the encoder of the network from the original and the stylized view of each input image with a novel feature invariance loss. In this way, we encourage the encoder to extract features that are already invariant to the style of the input, allowing the decoder to focus on parsing these features and not on further abstracting from the specific style of the input. We implement our method, named Condition-Invariant Semantic Segmentation (CISS), on the current state-of-the-art domain adaptation architecture and achieve outstanding results on condition-level adaptation. In particular, CISS sets the new state of the art in the popular daytime-to-nighttime Cityscapes

$\to$

Dark Zurich benchmark. Furthermore, our method achieves the second-best performance on the normal-to-adverse Cityscapes

$\to$

ACDC benchmark. CISS is shown to generalize well to domains unseen during training, such as BDD100K-night and ACDC-night.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量