Dongdong Xu;Jin Qian;Hao Feng;Zheng Li;Yongcheng Wang
{"title":"基于多尺度注意网络的多模态光学和SAR图像语义分割","authors":"Dongdong Xu;Jin Qian;Hao Feng;Zheng Li;Yongcheng Wang","doi":"10.1109/LGRS.2025.3561747","DOIUrl":null,"url":null,"abstract":"The joint semantic segmentation of multimodal remote sensing (RS) images can make up for the problem of insufficient features of single-modal images and effectively improve classification accuracy. Some deep learning methods have achieved good performance, but they face problems such as complex network structure, large number of parameters, and deployment difficulty. In this letter, more attention is paid to front-end and branch-level feature transformation to obtain multiscale semantic information. The multiscale dilated extraction module (MDEM) is constructed to mine the specific features of different modalities. The multimodal complementary attention module (MCAM) is designed for further acquiring prominent complementary content. The concatenated features are transmitted and reused by the dense convolution to complete the encoding. Ultimately, a general and concise end-to-end model is proposed. Comparative experiments are carried out on three heterogeneous datasets, and the model put forward performs well in qualitative analysis, quantitative comparison, and visual effect. Meanwhile, the dexterity and practicability of the model are more prominent, which can provide support for lightweight design and hardware deployment.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic Segmentation of Multimodal Optical and SAR Images With Multiscale Attention Network\",\"authors\":\"Dongdong Xu;Jin Qian;Hao Feng;Zheng Li;Yongcheng Wang\",\"doi\":\"10.1109/LGRS.2025.3561747\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The joint semantic segmentation of multimodal remote sensing (RS) images can make up for the problem of insufficient features of single-modal images and effectively improve classification accuracy. Some deep learning methods have achieved good performance, but they face problems such as complex network structure, large number of parameters, and deployment difficulty. In this letter, more attention is paid to front-end and branch-level feature transformation to obtain multiscale semantic information. The multiscale dilated extraction module (MDEM) is constructed to mine the specific features of different modalities. The multimodal complementary attention module (MCAM) is designed for further acquiring prominent complementary content. The concatenated features are transmitted and reused by the dense convolution to complete the encoding. Ultimately, a general and concise end-to-end model is proposed. Comparative experiments are carried out on three heterogeneous datasets, and the model put forward performs well in qualitative analysis, quantitative comparison, and visual effect. Meanwhile, the dexterity and practicability of the model are more prominent, which can provide support for lightweight design and hardware deployment.\",\"PeriodicalId\":91017,\"journal\":{\"name\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"volume\":\"22 \",\"pages\":\"1-5\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10967360/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10967360/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semantic Segmentation of Multimodal Optical and SAR Images With Multiscale Attention Network
The joint semantic segmentation of multimodal remote sensing (RS) images can make up for the problem of insufficient features of single-modal images and effectively improve classification accuracy. Some deep learning methods have achieved good performance, but they face problems such as complex network structure, large number of parameters, and deployment difficulty. In this letter, more attention is paid to front-end and branch-level feature transformation to obtain multiscale semantic information. The multiscale dilated extraction module (MDEM) is constructed to mine the specific features of different modalities. The multimodal complementary attention module (MCAM) is designed for further acquiring prominent complementary content. The concatenated features are transmitted and reused by the dense convolution to complete the encoding. Ultimately, a general and concise end-to-end model is proposed. Comparative experiments are carried out on three heterogeneous datasets, and the model put forward performs well in qualitative analysis, quantitative comparison, and visual effect. Meanwhile, the dexterity and practicability of the model are more prominent, which can provide support for lightweight design and hardware deployment.