STSNet：用于高分辨率土地覆盖物分割的跨空间分辨率多模式遥感深度融合网络

IF 15.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2024-09-08 DOI:10.1016/j.inffus.2024.102689

Beibei Yu , Jiayi Li , Xin Huang

{"title":"STSNet：用于高分辨率土地覆盖物分割的跨空间分辨率多模式遥感深度融合网络","authors":"Beibei Yu , Jiayi Li , Xin Huang","doi":"10.1016/j.inffus.2024.102689","DOIUrl":null,"url":null,"abstract":"<div><p>Recently, deep learning models have found extensive application in high-resolution land-cover segmentation research. However, the most current research still suffers from issues such as insufficient utilization of multi-modal information, which limits further improvement in high-resolution land-cover segmentation accuracy. Moreover, differences in the size and spatial resolution of multi-modal datasets collectively pose challenges to multi-modal land-cover segmentation. Therefore, we propose a high-resolution land-cover segmentation network (STSNet) with cross-spatial resolution <strong>s</strong>patio-<strong>t</strong>emporal-<strong>s</strong>pectral deep fusion. This network effectively utilizes spatio-temporal-spectral features to achieve information complementary among multi-modal data. Specifically, STSNet consists of four components: (1) A high resolution and multi-scale spatial-spectral encoder to jointly extract subtle spatial-spectral features in hyperspectral and high spatial resolution images. (2) A long-term spatio-temporal encoder formulated by spectral convolution and spatio-temporal transformer block to simultaneously delineates the spatial, temporal and spectral information in dense time series Sentinel-2 imagery. (3) A cross-resolution fusion module to alleviate the spatial resolution differences between multi-modal data and effectively leverages complementary spatio-temporal-spectral information. (4) A multi-scale decoder integrates multi-scale information from multi-modal data. We utilized airborne hyperspectral remote sensing imagery from the Shenyang region of China in 2020, with a spatial resolution of 1authors declare that they have no known competm, a spectral number of 249, and a spectral resolution ≤ 5 nm, and its Sentinel dense time-series images acquired in the same period with a spatial resolution of 10 m, a spectral number of 10, and a time-series number of 31. These datasets were combined to generate a multi-modal dataset called WHU-H<sup>2</sup>SR-MT, which is the first open accessed large-scale high spatio-temporal-spectral satellite remote sensing dataset (<em>i.e.</em>, with >2500 image pairs sized 300 <em>m</em> × 300 m for each). Additionally, we employed two open-source datasets to validate the effectiveness of the proposed modules. Extensive experiments show that our multi-scale spatial-spectral encoder, spatio-temporal encoder, and cross-resolution fusion module outperform existing state-of-the-art (SOTA) algorithms in terms of overall performance on high-resolution land-cover segmentation. The new multi-modal dataset will be made available at <span><span>http://irsip.whu.edu.cn/resources/resources_en_v2.php</span><svg><path></path></svg></span>, along with the corresponding code for accessing and utilizing the dataset at <span><span>https://github.com/RS-Mage/STSNet</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102689"},"PeriodicalIF":15.5000,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"STSNet: A cross-spatial resolution multi-modal remote sensing deep fusion network for high resolution land-cover segmentation\",\"authors\":\"Beibei Yu , Jiayi Li , Xin Huang\",\"doi\":\"10.1016/j.inffus.2024.102689\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Recently, deep learning models have found extensive application in high-resolution land-cover segmentation research. However, the most current research still suffers from issues such as insufficient utilization of multi-modal information, which limits further improvement in high-resolution land-cover segmentation accuracy. Moreover, differences in the size and spatial resolution of multi-modal datasets collectively pose challenges to multi-modal land-cover segmentation. Therefore, we propose a high-resolution land-cover segmentation network (STSNet) with cross-spatial resolution <strong>s</strong>patio-<strong>t</strong>emporal-<strong>s</strong>pectral deep fusion. This network effectively utilizes spatio-temporal-spectral features to achieve information complementary among multi-modal data. Specifically, STSNet consists of four components: (1) A high resolution and multi-scale spatial-spectral encoder to jointly extract subtle spatial-spectral features in hyperspectral and high spatial resolution images. (2) A long-term spatio-temporal encoder formulated by spectral convolution and spatio-temporal transformer block to simultaneously delineates the spatial, temporal and spectral information in dense time series Sentinel-2 imagery. (3) A cross-resolution fusion module to alleviate the spatial resolution differences between multi-modal data and effectively leverages complementary spatio-temporal-spectral information. (4) A multi-scale decoder integrates multi-scale information from multi-modal data. We utilized airborne hyperspectral remote sensing imagery from the Shenyang region of China in 2020, with a spatial resolution of 1authors declare that they have no known competm, a spectral number of 249, and a spectral resolution ≤ 5 nm, and its Sentinel dense time-series images acquired in the same period with a spatial resolution of 10 m, a spectral number of 10, and a time-series number of 31. These datasets were combined to generate a multi-modal dataset called WHU-H<sup>2</sup>SR-MT, which is the first open accessed large-scale high spatio-temporal-spectral satellite remote sensing dataset (<em>i.e.</em>, with >2500 image pairs sized 300 <em>m</em> × 300 m for each). Additionally, we employed two open-source datasets to validate the effectiveness of the proposed modules. Extensive experiments show that our multi-scale spatial-spectral encoder, spatio-temporal encoder, and cross-resolution fusion module outperform existing state-of-the-art (SOTA) algorithms in terms of overall performance on high-resolution land-cover segmentation. The new multi-modal dataset will be made available at <span><span>http://irsip.whu.edu.cn/resources/resources_en_v2.php</span><svg><path></path></svg></span>, along with the corresponding code for accessing and utilizing the dataset at <span><span>https://github.com/RS-Mage/STSNet</span><svg><path></path></svg></span>.</p></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"114 \",\"pages\":\"Article 102689\"},\"PeriodicalIF\":15.5000,\"publicationDate\":\"2024-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253524004676\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253524004676","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

最近，深度学习模型在高分辨率土地覆盖物分割研究中得到了广泛应用。然而，目前大多数研究仍存在多模态信息利用不足等问题，限制了高分辨率土地覆盖物分割精度的进一步提高。此外，多模态数据集在大小和空间分辨率上的差异也共同对多模态土地覆盖分割提出了挑战。因此，我们提出了一种跨空间分辨率时空-光谱深度融合的高分辨率土地覆盖物分割网络（STSNet）。该网络能有效利用时空-光谱特征，实现多模态数据之间的信息互补。具体来说，STSNet 由四个部分组成：(1) 一个高分辨率和多尺度空间光谱编码器，用于联合提取高光谱和高空间分辨率图像中的微妙空间光谱特征。(2) 由频谱卷积和时空变换块构成的长期时空编码器，可同时在高密度时间序列哨兵-2 图像中划分空间、时间和频谱信息。(3) 交叉分辨率融合模块，用于缓解多模态数据之间的空间分辨率差异，并有效利用互补的时空-光谱信息。(4) 多尺度解码器整合多模态数据中的多尺度信息。我们利用 2020 年中国沈阳地区的机载高光谱遥感影像（空间分辨率为 1作者声明不存在已知竞争m，光谱数为 249，光谱分辨率≤5 nm）及其同期获取的哨兵高密度时间序列影像（空间分辨率为 10 m，光谱数为 10，时间序列数为 31）。这些数据集合并生成了一个多模态数据集，名为 WHU-H2SR-MT，它是第一个开放访问的大规模高时空光谱卫星遥感数据集（即每个数据集有 2500 幅图像对，每幅图像对的尺寸为 300 m × 300 m）。此外，我们还使用了两个开源数据集来验证所提模块的有效性。广泛的实验表明，我们的多尺度空间光谱编码器、时空编码器和跨分辨率融合模块在高分辨率土地覆盖物分割方面的总体性能优于现有的最先进算法（SOTA）。新的多模态数据集将发布在 http://irsip.whu.edu.cn/resources/resources_en_v2.php 网站上，访问和使用该数据集的相应代码也将发布在 https://github.com/RS-Mage/STSNet 网站上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

STSNet: A cross-spatial resolution multi-modal remote sensing deep fusion network for high resolution land-cover segmentation

Recently, deep learning models have found extensive application in high-resolution land-cover segmentation research. However, the most current research still suffers from issues such as insufficient utilization of multi-modal information, which limits further improvement in high-resolution land-cover segmentation accuracy. Moreover, differences in the size and spatial resolution of multi-modal datasets collectively pose challenges to multi-modal land-cover segmentation. Therefore, we propose a high-resolution land-cover segmentation network (STSNet) with cross-spatial resolution spatio-temporal-spectral deep fusion. This network effectively utilizes spatio-temporal-spectral features to achieve information complementary among multi-modal data. Specifically, STSNet consists of four components: (1) A high resolution and multi-scale spatial-spectral encoder to jointly extract subtle spatial-spectral features in hyperspectral and high spatial resolution images. (2) A long-term spatio-temporal encoder formulated by spectral convolution and spatio-temporal transformer block to simultaneously delineates the spatial, temporal and spectral information in dense time series Sentinel-2 imagery. (3) A cross-resolution fusion module to alleviate the spatial resolution differences between multi-modal data and effectively leverages complementary spatio-temporal-spectral information. (4) A multi-scale decoder integrates multi-scale information from multi-modal data. We utilized airborne hyperspectral remote sensing imagery from the Shenyang region of China in 2020, with a spatial resolution of 1authors declare that they have no known competm, a spectral number of 249, and a spectral resolution ≤ 5 nm, and its Sentinel dense time-series images acquired in the same period with a spatial resolution of 10 m, a spectral number of 10, and a time-series number of 31. These datasets were combined to generate a multi-modal dataset called WHU-H²SR-MT, which is the first open accessed large-scale high spatio-temporal-spectral satellite remote sensing dataset (i.e., with >2500 image pairs sized 300 m × 300 m for each). Additionally, we employed two open-source datasets to validate the effectiveness of the proposed modules. Extensive experiments show that our multi-scale spatial-spectral encoder, spatio-temporal encoder, and cross-resolution fusion module outperform existing state-of-the-art (SOTA) algorithms in terms of overall performance on high-resolution land-cover segmentation. The new multi-modal dataset will be made available at http://irsip.whu.edu.cn/resources/resources_en_v2.php, along with the corresponding code for accessing and utilizing the dataset at https://github.com/RS-Mage/STSNet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.