MHNet: A Masked Hybrid Network for Robust Water Body Segmentation From Aerial Images

IF 8.6 1区地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Geoscience and Remote Sensing Pub Date : 2025-06-17 DOI:10.1109/TGRS.2025.3580479

Shuo Wang;Bin Wei;Boneng Shi;Ninglian Wang;Yuzhu Zhang;Yan Zhu

{"title":"MHNet: A Masked Hybrid Network for Robust Water Body Segmentation From Aerial Images","authors":"Shuo Wang;Bin Wei;Boneng Shi;Ninglian Wang;Yuzhu Zhang;Yan Zhu","doi":"10.1109/TGRS.2025.3580479","DOIUrl":null,"url":null,"abstract":"Accurate segmentation of water bodies from aerial images is critical for advancing our understanding of climate change, improving flood prevention and mitigation efforts, and supporting ecological monitoring. Recently, deep learning-based methods have made remarkable progress in water body segmentation. However, there still exist a series of challenges in practical applications, including mis-segmentation of low-contrast regions, difficult delineation of complex terrain boundaries, and loss of small water features. While most of the existing methods are designed for supervised learning with paired samples, they may also benefit from self-supervised learning techniques such as masked autoencoders (MAE). In this work, we focus on the water segmentation problem and propose a new water segmentation framework, named MHNet. MHNet integrates a hybrid multiscale encoder-decoder network, combining convolutional and transformer-based components to effectively capture global context while minimizing computational costs. A key innovation is the adaptation of the MAE mechanism, where masks are applied to multiscale Restormer block outputs during the training phase, enabling the model to better integrate local and global information, thereby enhancing boundary segmentation accuracy. Additionally, we propose a multichannel feature fusion (MCFF) module that synthesizes masked feature maps across scales, reducing redundancy and improving generalization by capturing both fine details and contextual information. Extensive experiments on multiple public datasets demonstrate that MHNet outperforms state-of-the-art methods, highlighting its effectiveness and robustness in water body extraction tasks. MHNet is deployed and performs online predictions via Google’s Vertex AI platform, thereby integrating it into a Geographic Information System (GIS) using Google Earth Engine (GEE) for accurate and efficient extraction of lakes on the Tibetan Plateau.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-15"},"PeriodicalIF":8.6000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11037726/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate segmentation of water bodies from aerial images is critical for advancing our understanding of climate change, improving flood prevention and mitigation efforts, and supporting ecological monitoring. Recently, deep learning-based methods have made remarkable progress in water body segmentation. However, there still exist a series of challenges in practical applications, including mis-segmentation of low-contrast regions, difficult delineation of complex terrain boundaries, and loss of small water features. While most of the existing methods are designed for supervised learning with paired samples, they may also benefit from self-supervised learning techniques such as masked autoencoders (MAE). In this work, we focus on the water segmentation problem and propose a new water segmentation framework, named MHNet. MHNet integrates a hybrid multiscale encoder-decoder network, combining convolutional and transformer-based components to effectively capture global context while minimizing computational costs. A key innovation is the adaptation of the MAE mechanism, where masks are applied to multiscale Restormer block outputs during the training phase, enabling the model to better integrate local and global information, thereby enhancing boundary segmentation accuracy. Additionally, we propose a multichannel feature fusion (MCFF) module that synthesizes masked feature maps across scales, reducing redundancy and improving generalization by capturing both fine details and contextual information. Extensive experiments on multiple public datasets demonstrate that MHNet outperforms state-of-the-art methods, highlighting its effectiveness and robustness in water body extraction tasks. MHNet is deployed and performs online predictions via Google’s Vertex AI platform, thereby integrating it into a Geographic Information System (GIS) using Google Earth Engine (GEE) for accurate and efficient extraction of lakes on the Tibetan Plateau.

查看原文本刊更多论文

MHNet：一种用于航空图像水体鲁棒分割的掩膜混合网络

从航空图像中准确分割水体，对于提高我们对气候变化的认识、改善防洪和减灾工作以及支持生态监测至关重要。近年来，基于深度学习的方法在水体分割方面取得了显著进展。然而，在实际应用中仍存在低对比度区域分割错误、复杂地形边界难以圈定、小水物丢失等一系列挑战。虽然大多数现有方法都是为成对样本的监督学习而设计的，但它们也可能受益于自监督学习技术，如掩码自编码器（MAE）。在这项工作中，我们重点研究了水分割问题，并提出了一个新的水分割框架，名为MHNet。MHNet集成了一个混合多尺度编码器-解码器网络，结合了卷积和基于变压器的组件，可以有效地捕获全局上下文，同时最大限度地降低计算成本。一个关键的创新是对MAE机制的适应，在训练阶段将掩模应用于多尺度Restormer块输出，使模型能够更好地整合局部和全局信息，从而提高边界分割的准确性。此外，我们提出了一种多通道特征融合（MCFF）模块，该模块综合了跨尺度的掩蔽特征映射，通过捕获精细细节和上下文信息来减少冗余并提高泛化。在多个公共数据集上的大量实验表明，MHNet优于最先进的方法，突出了其在水体提取任务中的有效性和鲁棒性。MHNet通过谷歌的Vertex AI平台部署并执行在线预测，从而将其集成到使用谷歌地球引擎（GEE）的地理信息系统（GIS）中，以准确有效地提取青藏高原的湖泊。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Geoscience and Remote Sensing 工程技术-地球化学与地球物理

CiteScore

11.50

自引率

28.00%

发文量

1912

审稿时长

4.0 months

期刊介绍： IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.