基于跨模态层次特征融合的航空图像语义分割方法

IF 4.4

IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society Pub Date : 2025-08-25 DOI:10.1109/LGRS.2025.3602267

Jinglei Bai;Jinfu Yang;Tao Xiang;Shu Cai

{"title":"基于跨模态层次特征融合的航空图像语义分割方法","authors":"Jinglei Bai;Jinfu Yang;Tao Xiang;Shu Cai","doi":"10.1109/LGRS.2025.3602267","DOIUrl":null,"url":null,"abstract":"Multimodal aerial image semantic segmentation enables fine-grained land cover classification by integrating data from different sensors, yet it remains challenged by information redundancy, intermodal feature discrepancies, and class confusion in complex scenes. To address these issues, we propose a cross-modal hierarchical feature fusion network (CMHFNet) based on an encoder–decoder architecture. The encoder incorporates a pixelwise attention-guided fusion module (PAFM) and a multistage progressive fusion transformer (MPFT) to suppress redundancy and model long-range intermodal dependencies and scale variations. The decoder introduces a residual information-guided feature compensation mechanism to recover spatial details and mitigate class ambiguity. The experiments on DDOS, Vaihingen, and Potsdam datasets demonstrate that the CMHFNet surpasses state-of-the-art methods, validating its effectiveness and practical value.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4000,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Aerial Image Semantic Segmentation Method Based on Cross-Modal Hierarchical Feature Fusion\",\"authors\":\"Jinglei Bai;Jinfu Yang;Tao Xiang;Shu Cai\",\"doi\":\"10.1109/LGRS.2025.3602267\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multimodal aerial image semantic segmentation enables fine-grained land cover classification by integrating data from different sensors, yet it remains challenged by information redundancy, intermodal feature discrepancies, and class confusion in complex scenes. To address these issues, we propose a cross-modal hierarchical feature fusion network (CMHFNet) based on an encoder–decoder architecture. The encoder incorporates a pixelwise attention-guided fusion module (PAFM) and a multistage progressive fusion transformer (MPFT) to suppress redundancy and model long-range intermodal dependencies and scale variations. The decoder introduces a residual information-guided feature compensation mechanism to recover spatial details and mitigate class ambiguity. The experiments on DDOS, Vaihingen, and Potsdam datasets demonstrate that the CMHFNet surpasses state-of-the-art methods, validating its effectiveness and practical value.\",\"PeriodicalId\":91017,\"journal\":{\"name\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"volume\":\"22 \",\"pages\":\"1-5\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11137359/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11137359/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

多模态航空图像语义分割通过整合不同传感器的数据实现了细粒度的土地覆盖分类，但在复杂场景中仍然存在信息冗余、多模态特征差异和类混淆等问题。为了解决这些问题，我们提出了一种基于编码器-解码器架构的跨模态分层特征融合网络（CMHFNet）。该编码器集成了一个像素级注意力引导融合模块（PAFM）和一个多级渐进融合变压器（MPFT），以抑制冗余并模拟远程多式联运依赖性和规模变化。该解码器引入残差信息导向的特征补偿机制来恢复空间细节，减轻类模糊。在DDOS、Vaihingen和Potsdam数据集上的实验表明，CMHFNet超越了最先进的方法，验证了其有效性和实用价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Aerial Image Semantic Segmentation Method Based on Cross-Modal Hierarchical Feature Fusion

Multimodal aerial image semantic segmentation enables fine-grained land cover classification by integrating data from different sensors, yet it remains challenged by information redundancy, intermodal feature discrepancies, and class confusion in complex scenes. To address these issues, we propose a cross-modal hierarchical feature fusion network (CMHFNet) based on an encoder–decoder architecture. The encoder incorporates a pixelwise attention-guided fusion module (PAFM) and a multistage progressive fusion transformer (MPFT) to suppress redundancy and model long-range intermodal dependencies and scale variations. The decoder introduces a residual information-guided feature compensation mechanism to recover spatial details and mitigate class ambiguity. The experiments on DDOS, Vaihingen, and Potsdam datasets demonstrate that the CMHFNet surpasses state-of-the-art methods, validating its effectiveness and practical value.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society

自引率

0.00%

发文量