{"title":"MHNet: A Masked Hybrid Network for Robust Water Body Segmentation From Aerial Images","authors":"Shuo Wang;Bin Wei;Boneng Shi;Ninglian Wang;Yuzhu Zhang;Yan Zhu","doi":"10.1109/TGRS.2025.3580479","DOIUrl":null,"url":null,"abstract":"Accurate segmentation of water bodies from aerial images is critical for advancing our understanding of climate change, improving flood prevention and mitigation efforts, and supporting ecological monitoring. Recently, deep learning-based methods have made remarkable progress in water body segmentation. However, there still exist a series of challenges in practical applications, including mis-segmentation of low-contrast regions, difficult delineation of complex terrain boundaries, and loss of small water features. While most of the existing methods are designed for supervised learning with paired samples, they may also benefit from self-supervised learning techniques such as masked autoencoders (MAE). In this work, we focus on the water segmentation problem and propose a new water segmentation framework, named MHNet. MHNet integrates a hybrid multiscale encoder-decoder network, combining convolutional and transformer-based components to effectively capture global context while minimizing computational costs. A key innovation is the adaptation of the MAE mechanism, where masks are applied to multiscale Restormer block outputs during the training phase, enabling the model to better integrate local and global information, thereby enhancing boundary segmentation accuracy. Additionally, we propose a multichannel feature fusion (MCFF) module that synthesizes masked feature maps across scales, reducing redundancy and improving generalization by capturing both fine details and contextual information. Extensive experiments on multiple public datasets demonstrate that MHNet outperforms state-of-the-art methods, highlighting its effectiveness and robustness in water body extraction tasks. MHNet is deployed and performs online predictions via Google’s Vertex AI platform, thereby integrating it into a Geographic Information System (GIS) using Google Earth Engine (GEE) for accurate and efficient extraction of lakes on the Tibetan Plateau.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-15"},"PeriodicalIF":8.6000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11037726/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate segmentation of water bodies from aerial images is critical for advancing our understanding of climate change, improving flood prevention and mitigation efforts, and supporting ecological monitoring. Recently, deep learning-based methods have made remarkable progress in water body segmentation. However, there still exist a series of challenges in practical applications, including mis-segmentation of low-contrast regions, difficult delineation of complex terrain boundaries, and loss of small water features. While most of the existing methods are designed for supervised learning with paired samples, they may also benefit from self-supervised learning techniques such as masked autoencoders (MAE). In this work, we focus on the water segmentation problem and propose a new water segmentation framework, named MHNet. MHNet integrates a hybrid multiscale encoder-decoder network, combining convolutional and transformer-based components to effectively capture global context while minimizing computational costs. A key innovation is the adaptation of the MAE mechanism, where masks are applied to multiscale Restormer block outputs during the training phase, enabling the model to better integrate local and global information, thereby enhancing boundary segmentation accuracy. Additionally, we propose a multichannel feature fusion (MCFF) module that synthesizes masked feature maps across scales, reducing redundancy and improving generalization by capturing both fine details and contextual information. Extensive experiments on multiple public datasets demonstrate that MHNet outperforms state-of-the-art methods, highlighting its effectiveness and robustness in water body extraction tasks. MHNet is deployed and performs online predictions via Google’s Vertex AI platform, thereby integrating it into a Geographic Information System (GIS) using Google Earth Engine (GEE) for accurate and efficient extraction of lakes on the Tibetan Plateau.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.