Muxin Liao , Shishun Tian , Yuhang Zhang , Guoguang Hua , Rong You , Wenbin Zou , Xia Li
{"title":"Class-discriminative domain generalization for semantic segmentation","authors":"Muxin Liao , Shishun Tian , Yuhang Zhang , Guoguang Hua , Rong You , Wenbin Zou , Xia Li","doi":"10.1016/j.imavis.2024.105393","DOIUrl":null,"url":null,"abstract":"<div><div>Existing domain generalization semantic segmentation methods aim to improve the generalization ability by learning domain-invariant information for generalizing well on unseen domains. However, these methods ignore the class discriminability of models, which may lead to a class confusion problem. In this paper, a class-discriminative domain generalization (CDDG) approach is proposed to simultaneously alleviate the distribution shift and class confusion for semantic segmentation. Specifically, a dual prototypical contrastive learning module is proposed. Since the high-frequency component is consistent across different domains, a class-text-guided high-frequency prototypical contrastive learning is proposed. It uses text embeddings as prior knowledge for guiding the learning of high-frequency prototypical representation from high-frequency components to mine domain-invariant information and further improve the generalization ability. However, the domain-specific information may also contain label-related information which refers to the discrimination of a specific class. Thus, only learning the domain-invariant information may limit the class discriminability of models. To address this issue, a low-frequency prototypical contrastive learning is proposed to learn the class-discriminative representation from low-frequency components since it is more domain-specific across different domains. Finally, the class-discriminative representation and high-frequency prototypical representation are fused to simultaneously improve the generalization ability and class discriminability of the model. Extensive experiments demonstrate that the proposed approach outperforms current methods on single- and multi-source domain generalization benchmarks.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105393"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004980","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Existing domain generalization semantic segmentation methods aim to improve the generalization ability by learning domain-invariant information for generalizing well on unseen domains. However, these methods ignore the class discriminability of models, which may lead to a class confusion problem. In this paper, a class-discriminative domain generalization (CDDG) approach is proposed to simultaneously alleviate the distribution shift and class confusion for semantic segmentation. Specifically, a dual prototypical contrastive learning module is proposed. Since the high-frequency component is consistent across different domains, a class-text-guided high-frequency prototypical contrastive learning is proposed. It uses text embeddings as prior knowledge for guiding the learning of high-frequency prototypical representation from high-frequency components to mine domain-invariant information and further improve the generalization ability. However, the domain-specific information may also contain label-related information which refers to the discrimination of a specific class. Thus, only learning the domain-invariant information may limit the class discriminability of models. To address this issue, a low-frequency prototypical contrastive learning is proposed to learn the class-discriminative representation from low-frequency components since it is more domain-specific across different domains. Finally, the class-discriminative representation and high-frequency prototypical representation are fused to simultaneously improve the generalization ability and class discriminability of the model. Extensive experiments demonstrate that the proposed approach outperforms current methods on single- and multi-source domain generalization benchmarks.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.