{"title":"MGFNet:用于高分辨率多模态遥感图像语义分割的 MLP 主导门控融合网络","authors":"Kan Wei , JinKun Dai , Danfeng Hong , Yuanxin Ye","doi":"10.1016/j.jag.2024.104241","DOIUrl":null,"url":null,"abstract":"<div><div>The heterogeneity and complexity of multimodal data in high-resolution remote sensing images significantly challenges existing cross-modal networks in fusing the complementary information of high-resolution optical and synthetic aperture radar (SAR) images for precise semantic segmentation. To address this issue, this paper proposes a multi-layer perceptron (MLP) dominated gate fusion network (MGFNet). MGFNet consists of three modules: a multi-path feature extraction network, an MLP-gate fusion (MGF) module, and a decoder. Initially, MGFNet independently extracts features from high-resolution optical and SAR images while preserving spatial information. Then, the well-designed MGF module combines the multi-modal features through channel attention and gated fusion stages, utilizing MLP as a gate to exploit complementary information and filter redundant data. Additionally, we introduce a novel high-resolution multimodal remote sensing dataset, YESeg-OPT-SAR, with a spatial resolution of 0.5 m. To evaluate MGFNet, we compare it with several state-of-the-art (SOTA) models using YESeg-OPT-SAR and Pohang datasets, both of which are high-resolution multi-modal datasets. The experimental results demonstrate that MGFNet achieves higher evaluation metrics compared to other models, indicating its effectiveness in multi-modal feature fusion for segmentation. The source code and data are available at <span><span>https://github.com/yeyuanxin110/YESeg-OPT-SAR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"135 ","pages":"Article 104241"},"PeriodicalIF":7.6000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MGFNet: An MLP-dominated gated fusion network for semantic segmentation of high-resolution multi-modal remote sensing images\",\"authors\":\"Kan Wei , JinKun Dai , Danfeng Hong , Yuanxin Ye\",\"doi\":\"10.1016/j.jag.2024.104241\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The heterogeneity and complexity of multimodal data in high-resolution remote sensing images significantly challenges existing cross-modal networks in fusing the complementary information of high-resolution optical and synthetic aperture radar (SAR) images for precise semantic segmentation. To address this issue, this paper proposes a multi-layer perceptron (MLP) dominated gate fusion network (MGFNet). MGFNet consists of three modules: a multi-path feature extraction network, an MLP-gate fusion (MGF) module, and a decoder. Initially, MGFNet independently extracts features from high-resolution optical and SAR images while preserving spatial information. Then, the well-designed MGF module combines the multi-modal features through channel attention and gated fusion stages, utilizing MLP as a gate to exploit complementary information and filter redundant data. Additionally, we introduce a novel high-resolution multimodal remote sensing dataset, YESeg-OPT-SAR, with a spatial resolution of 0.5 m. To evaluate MGFNet, we compare it with several state-of-the-art (SOTA) models using YESeg-OPT-SAR and Pohang datasets, both of which are high-resolution multi-modal datasets. The experimental results demonstrate that MGFNet achieves higher evaluation metrics compared to other models, indicating its effectiveness in multi-modal feature fusion for segmentation. The source code and data are available at <span><span>https://github.com/yeyuanxin110/YESeg-OPT-SAR</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":73423,\"journal\":{\"name\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"volume\":\"135 \",\"pages\":\"Article 104241\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2024-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1569843224005971\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"REMOTE SENSING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of applied earth observation and geoinformation : ITC journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569843224005971","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}
MGFNet: An MLP-dominated gated fusion network for semantic segmentation of high-resolution multi-modal remote sensing images
The heterogeneity and complexity of multimodal data in high-resolution remote sensing images significantly challenges existing cross-modal networks in fusing the complementary information of high-resolution optical and synthetic aperture radar (SAR) images for precise semantic segmentation. To address this issue, this paper proposes a multi-layer perceptron (MLP) dominated gate fusion network (MGFNet). MGFNet consists of three modules: a multi-path feature extraction network, an MLP-gate fusion (MGF) module, and a decoder. Initially, MGFNet independently extracts features from high-resolution optical and SAR images while preserving spatial information. Then, the well-designed MGF module combines the multi-modal features through channel attention and gated fusion stages, utilizing MLP as a gate to exploit complementary information and filter redundant data. Additionally, we introduce a novel high-resolution multimodal remote sensing dataset, YESeg-OPT-SAR, with a spatial resolution of 0.5 m. To evaluate MGFNet, we compare it with several state-of-the-art (SOTA) models using YESeg-OPT-SAR and Pohang datasets, both of which are high-resolution multi-modal datasets. The experimental results demonstrate that MGFNet achieves higher evaluation metrics compared to other models, indicating its effectiveness in multi-modal feature fusion for segmentation. The source code and data are available at https://github.com/yeyuanxin110/YESeg-OPT-SAR.
期刊介绍:
The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.