WaterFormer:一种用于光学遥感图像水体检测的耦合变压器和CNN网络

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2023-11-17 DOI:10.1016/j.isprsjprs.2023.11.006

Jian Kang , Haiyan Guan , Lingfei Ma , Lanying Wang , Zhengsen Xu , Jonathan Li

{"title":"WaterFormer:一种用于光学遥感图像水体检测的耦合变压器和CNN网络","authors":"Jian Kang , Haiyan Guan , Lingfei Ma , Lanying Wang , Zhengsen Xu , Jonathan Li","doi":"10.1016/j.isprsjprs.2023.11.006","DOIUrl":null,"url":null,"abstract":"<div><p><span><span>As one of the most significant components of the ecosystem, waterbody needs to be highly monitored at different spatial and temporal scales. Nevertheless, waterbody variations in shape, size, and reflectivity, complicated and varied types of land covers, and environmental scene diversity, present colossal challenges in achieving accurate waterbody detection (WD). In this paper, we propose a novel network coupled with the Transformer and convolutional neural network<span><span> (CNN), termed WaterFormer, to automatically, efficiently, and accurately delineate waterbodies from optical high-resolution remotely sensed (HR-RS) images. This network mainly includes a dual-stream CNN, a cross-level Vision Transformer, a light-weight attention module, and a sub-pixel up-sampling module. First, the dual-stream network abstracts waterbody features at multi-views and different levels. Then, to exploit the long-range dependencies between low-level spatial information and high-order </span>semantic features, the cross-level Vision Transformer is embedded into the dual-stream, aiming at improving WD accuracy. Afterwards, the light-weight attention module is adopted to provide semantically strong feature abstractions by enhancing discrimination neurons, and the sub-pixel up-sampling module is employed to further generate high-resolution and high-quality class-specific representations. Quantitative and qualitative evaluations demonstrated that the WaterFormer provided a promising means for detecting waterbody areas in satellite images under complex scene conditions. Moreover, comparative analyses with the state-of-the-art (SOTA) alternatives, e.g., </span></span>MSFENet, MSAFNet, and BiSeNet, also verified the generalization and superiority of the WaterFormer in WD tasks. The assessment results exhibited that the WaterFormer gained an average accuracy of 97.24%, average precision of 94.59%, average recall of 91.95%, average </span><em>F<sub>1</sub></em>-score of 93.24%, and average Kappa index of 0.9133, respectively. Additionally, we presented an open-access HR satellite imagery waterbody dataset, a mesoscale dataset with high-quality and high-precision waterbody annotation to facilitate future research in this field. The dataset has been released at <span>https://github.com/NJdeuK/WD_Dataset</span><svg><path></path></svg>.</p></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"206 ","pages":"Pages 222-241"},"PeriodicalIF":10.6000,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"WaterFormer: A coupled transformer and CNN network for waterbody detection in optical remotely-sensed imagery\",\"authors\":\"Jian Kang , Haiyan Guan , Lingfei Ma , Lanying Wang , Zhengsen Xu , Jonathan Li\",\"doi\":\"10.1016/j.isprsjprs.2023.11.006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p><span><span>As one of the most significant components of the ecosystem, waterbody needs to be highly monitored at different spatial and temporal scales. Nevertheless, waterbody variations in shape, size, and reflectivity, complicated and varied types of land covers, and environmental scene diversity, present colossal challenges in achieving accurate waterbody detection (WD). In this paper, we propose a novel network coupled with the Transformer and convolutional neural network<span><span> (CNN), termed WaterFormer, to automatically, efficiently, and accurately delineate waterbodies from optical high-resolution remotely sensed (HR-RS) images. This network mainly includes a dual-stream CNN, a cross-level Vision Transformer, a light-weight attention module, and a sub-pixel up-sampling module. First, the dual-stream network abstracts waterbody features at multi-views and different levels. Then, to exploit the long-range dependencies between low-level spatial information and high-order </span>semantic features, the cross-level Vision Transformer is embedded into the dual-stream, aiming at improving WD accuracy. Afterwards, the light-weight attention module is adopted to provide semantically strong feature abstractions by enhancing discrimination neurons, and the sub-pixel up-sampling module is employed to further generate high-resolution and high-quality class-specific representations. Quantitative and qualitative evaluations demonstrated that the WaterFormer provided a promising means for detecting waterbody areas in satellite images under complex scene conditions. Moreover, comparative analyses with the state-of-the-art (SOTA) alternatives, e.g., </span></span>MSFENet, MSAFNet, and BiSeNet, also verified the generalization and superiority of the WaterFormer in WD tasks. The assessment results exhibited that the WaterFormer gained an average accuracy of 97.24%, average precision of 94.59%, average recall of 91.95%, average </span><em>F<sub>1</sub></em>-score of 93.24%, and average Kappa index of 0.9133, respectively. Additionally, we presented an open-access HR satellite imagery waterbody dataset, a mesoscale dataset with high-quality and high-precision waterbody annotation to facilitate future research in this field. The dataset has been released at <span>https://github.com/NJdeuK/WD_Dataset</span><svg><path></path></svg>.</p></div>\",\"PeriodicalId\":50269,\"journal\":{\"name\":\"ISPRS Journal of Photogrammetry and Remote Sensing\",\"volume\":\"206 \",\"pages\":\"Pages 222-241\"},\"PeriodicalIF\":10.6000,\"publicationDate\":\"2023-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ISPRS Journal of Photogrammetry and Remote Sensing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0924271623003118\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOGRAPHY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924271623003118","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

摘要

水体作为生态系统最重要的组成部分之一，需要在不同的时空尺度上进行高度监测。然而，水体形状、大小和反射率的变化，土地覆盖类型的复杂性和多样性，以及环境场景的多样性，给实现准确的水体探测(WD)带来了巨大的挑战。在本文中，我们提出了一种新的网络，结合变压器和卷积神经网络(CNN)，称为WaterFormer，从光学高分辨率遥感(HR-RS)图像中自动，高效，准确地描绘水体。该网络主要包括双流CNN、跨电平视觉转换器、轻量级注意力模块和亚像素上采样模块。首先，双流网络在多视角、不同层次上抽象水体特征。然后，利用低层次空间信息与高阶语义特征之间的远程依赖关系，在双流中嵌入跨层次视觉转换器，以提高WD精度。随后，采用轻量级关注模块通过增强识别神经元提供语义强的特征抽象，采用亚像素上采样模块进一步生成高分辨率、高质量的类特定表示。定量和定性评价表明，WaterFormer为复杂场景条件下的卫星图像水体区域检测提供了一种很有前途的手段。此外，通过与最先进的SOTA替代方案(如MSFENet、MSAFNet和BiSeNet)的对比分析，也验证了WaterFormer在WD任务中的通用性和优越性。评价结果表明，WaterFormer的平均正确率为97.24%，平均精密度为94.59%，平均召回率为91.95%，平均f1得分为93.24%，平均Kappa指数为0.9133。此外，我们还提供了一个开放获取的HR卫星图像水体数据集，一个具有高质量和高精度水体注释的中尺度数据集，以促进该领域的未来研究。该数据集已在https://github.com/NJdeuK/WD_Dataset上发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

WaterFormer: A coupled transformer and CNN network for waterbody detection in optical remotely-sensed imagery

As one of the most significant components of the ecosystem, waterbody needs to be highly monitored at different spatial and temporal scales. Nevertheless, waterbody variations in shape, size, and reflectivity, complicated and varied types of land covers, and environmental scene diversity, present colossal challenges in achieving accurate waterbody detection (WD). In this paper, we propose a novel network coupled with the Transformer and convolutional neural network (CNN), termed WaterFormer, to automatically, efficiently, and accurately delineate waterbodies from optical high-resolution remotely sensed (HR-RS) images. This network mainly includes a dual-stream CNN, a cross-level Vision Transformer, a light-weight attention module, and a sub-pixel up-sampling module. First, the dual-stream network abstracts waterbody features at multi-views and different levels. Then, to exploit the long-range dependencies between low-level spatial information and high-order semantic features, the cross-level Vision Transformer is embedded into the dual-stream, aiming at improving WD accuracy. Afterwards, the light-weight attention module is adopted to provide semantically strong feature abstractions by enhancing discrimination neurons, and the sub-pixel up-sampling module is employed to further generate high-resolution and high-quality class-specific representations. Quantitative and qualitative evaluations demonstrated that the WaterFormer provided a promising means for detecting waterbody areas in satellite images under complex scene conditions. Moreover, comparative analyses with the state-of-the-art (SOTA) alternatives, e.g., MSFENet, MSAFNet, and BiSeNet, also verified the generalization and superiority of the WaterFormer in WD tasks. The assessment results exhibited that the WaterFormer gained an average accuracy of 97.24%, average precision of 94.59%, average recall of 91.95%, average F₁-score of 93.24%, and average Kappa index of 0.9133, respectively. Additionally, we presented an open-access HR satellite imagery waterbody dataset, a mesoscale dataset with high-quality and high-precision waterbody annotation to facilitate future research in this field. The dataset has been released at https://github.com/NJdeuK/WD_Dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ISPRS Journal of Photogrammetry and Remote Sensing 工程技术-成像科学与照相技术

CiteScore

21.00

自引率

6.30%

发文量

273

审稿时长

40 days

期刊介绍： The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive. P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields. In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.