LMFNet: Lightweight Multimodal Fusion Network for high-resolution remote sensing image segmentation

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-03-15 DOI:10.1016/j.patcog.2025.111579

Tong Wang , Guanzhou Chen , Xiaodong Zhang , Chenxi Liu , Jiaqi Wang , Xiaoliang Tan , Wenlin Zhou , Chanjuan He

{"title":"LMFNet: Lightweight Multimodal Fusion Network for high-resolution remote sensing image segmentation","authors":"Tong Wang , Guanzhou Chen , Xiaodong Zhang , Chenxi Liu , Jiaqi Wang , Xiaoliang Tan , Wenlin Zhou , Chanjuan He","doi":"10.1016/j.patcog.2025.111579","DOIUrl":null,"url":null,"abstract":"<div><div>Despite the rapid evolution of semantic segmentation for land cover classification in high-resolution remote sensing imagery, integrating multiple data modalities such as Digital Surface Model (DSM), RGB, and Near-infrared (NIR) remains a challenge. Current methods often process only two types of data, missing out on the rich information that additional modalities can provide. Addressing this gap, we propose a novel <strong>L</strong>ightweight <strong>M</strong>ultimodal data <strong>F</strong>usion <strong>Net</strong>work (LMFNet) to accomplish the tasks of fusion and semantic segmentation of multimodal remote sensing images. LMFNet uniquely accommodates various data types simultaneously, including RGB, NirRG, and DSM, through a weight-sharing, multi-branch vision transformer that minimizes parameter count while ensuring robust feature extraction. Our proposed multimodal fusion module integrates a <em>Multimodal Feature Fusion Reconstruction Layer</em> and <em>Multimodal Feature Self-Attention Fusion Layer</em>, which can reconstruct and fuse multimodal features. Our method achieves a mean Intersection over Union (<span><math><mrow><mi>m</mi><mi>I</mi><mi>o</mi><mi>U</mi></mrow></math></span>) of 85.09% on the US3D dataset, marking a significant improvement over existing methods. We also studied the scalability of our method, directly extending the input modality to the SAR and hyperspectral fields. Our experimental results on the C2Seg dataset show that our method has generalization applicability to data of various modalities.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111579"},"PeriodicalIF":7.5000,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325002390","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Despite the rapid evolution of semantic segmentation for land cover classification in high-resolution remote sensing imagery, integrating multiple data modalities such as Digital Surface Model (DSM), RGB, and Near-infrared (NIR) remains a challenge. Current methods often process only two types of data, missing out on the rich information that additional modalities can provide. Addressing this gap, we propose a novel Lightweight Multimodal data Fusion Network (LMFNet) to accomplish the tasks of fusion and semantic segmentation of multimodal remote sensing images. LMFNet uniquely accommodates various data types simultaneously, including RGB, NirRG, and DSM, through a weight-sharing, multi-branch vision transformer that minimizes parameter count while ensuring robust feature extraction. Our proposed multimodal fusion module integrates a Multimodal Feature Fusion Reconstruction Layer and Multimodal Feature Self-Attention Fusion Layer, which can reconstruct and fuse multimodal features. Our method achieves a mean Intersection over Union (

m I o U

) of 85.09% on the US3D dataset, marking a significant improvement over existing methods. We also studied the scalability of our method, directly extending the input modality to the SAR and hyperspectral fields. Our experimental results on the C2Seg dataset show that our method has generalization applicability to data of various modalities.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.