{"title":"Joint Style and Layout Synthesizing: Toward Generalizable Remote Sensing Semantic Segmentation","authors":"Qi Zang;Shuang Wang;Dong Zhao;Zhun Zhong;Biao Hou;Licheng Jiao","doi":"10.1109/TCSVT.2024.3522936","DOIUrl":null,"url":null,"abstract":"This paper studies the domain generalized remote sensing semantic segmentation (RSSS), aiming to generalize a model trained only on the source domain to unseen domains. Existing methods in computer vision treat style information as domain characteristics to achieve domain-agnostic learning. Nevertheless, their generalizability to RSSS remains constrained, due to the incomplete consideration of domain characteristics. We argue that remote sensing scenes have layout differences beyond just style. Considering this, we devise a joint style and layout synthesizing framework, enabling the model to jointly learn out-of-domain samples synthesized from these two perspectives. For style, we estimate the variant intensities of per-class representations affected by domain shift and randomly sample within this modeled scope to reasonably expand the boundaries of style-carrying feature statistics. For layout, we explore potential scenes with diverse layouts in the source domain and propose granularity-fixed and granularity-learnable masks to perturb layouts, forcing the model to learn characteristics of objects rather than variable positions. The mask is designed to learn more context-robust representations by discovering difficult-to-recognize perturbation directions. Subsequently, we impose gradient angle constraints between the samples synthesized using the two ways to correct conflicting optimization directions. Extensive experiments demonstrate the superior generalization ability of our method over existing methods.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4055-4071"},"PeriodicalIF":8.3000,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10817546/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
This paper studies the domain generalized remote sensing semantic segmentation (RSSS), aiming to generalize a model trained only on the source domain to unseen domains. Existing methods in computer vision treat style information as domain characteristics to achieve domain-agnostic learning. Nevertheless, their generalizability to RSSS remains constrained, due to the incomplete consideration of domain characteristics. We argue that remote sensing scenes have layout differences beyond just style. Considering this, we devise a joint style and layout synthesizing framework, enabling the model to jointly learn out-of-domain samples synthesized from these two perspectives. For style, we estimate the variant intensities of per-class representations affected by domain shift and randomly sample within this modeled scope to reasonably expand the boundaries of style-carrying feature statistics. For layout, we explore potential scenes with diverse layouts in the source domain and propose granularity-fixed and granularity-learnable masks to perturb layouts, forcing the model to learn characteristics of objects rather than variable positions. The mask is designed to learn more context-robust representations by discovering difficult-to-recognize perturbation directions. Subsequently, we impose gradient angle constraints between the samples synthesized using the two ways to correct conflicting optimization directions. Extensive experiments demonstrate the superior generalization ability of our method over existing methods.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.