S3F2Net：高光谱图像与激光雷达数据分类的空间-光谱-结构特征融合网络

IF 8.3 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-01-03 DOI:10.1109/TCSVT.2025.3525734

Xianghai Wang;Liyang Song;Yining Feng;Junheng Zhu

{"title":"S3F2Net：高光谱图像与激光雷达数据分类的空间-光谱-结构特征融合网络","authors":"Xianghai Wang;Liyang Song;Yining Feng;Junheng Zhu","doi":"10.1109/TCSVT.2025.3525734","DOIUrl":null,"url":null,"abstract":"The continuous development of Earth observation (EO) technology has significantly increased the availability of multi-sensor remote sensing (RS) data. The fusion of hyperspectral image (HSI) and light detection and ranging (LiDAR) data has become a research hotspot. Current mainstream convolutional neural networks (CNNs) excel at extracting local features from images but have limitations in modeling global information, which may affect the performance of classification tasks. In contrast, modern graph convolutional networks (GCNs) excel at capturing global information, particularly demonstrating significant advantages when processing RS images with irregular topological structures. By integrating these two frameworks, features can be fused from multiple perspectives, enabling a more comprehensive capture of multimodal data attributes and improving classification performance. The paper proposes a spatial-spectral-structural feature fusion network (S3F2Net) for HSI and LiDAR data classification. S3F2Net utilizes multiple architectures to extract rich features of multimodal data from different perspectives. On one hand, local spatial and spectral features of multimodal data are extracted using CNN, enhancing interactions among heterogeneous data through shared-weight convolution to achieve detailed representations of land cover. On the other hand, the global topological structure is learned using GCN, which models the spatial relationships between land cover types through graph structure constructed from LiDAR data, thereby enhancing the model’s understanding of scene content. Furthermore, the dynamic node updating strategy within the GCN enhances the model’s ability to identify representative nodes for specific land cover types while facilitating information aggregation among remote nodes, thereby strengthening adaptability to complex topological structures. By employing a multi-level information fusion strategy to integrate data representations from both global and local perspectives, the accuracy and reliability of the results are ensured. Compared with state-of-the-art (SOTA) methods, the framework’s validity is verified on three real multimodal RS datasets. The source code will be available at <uri>https://github.com/slylnnu/S3F2Net</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4801-4815"},"PeriodicalIF":8.3000,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"S3F2Net: Spatial-Spectral-Structural Feature Fusion Network for Hyperspectral Image and LiDAR Data Classification\",\"authors\":\"Xianghai Wang;Liyang Song;Yining Feng;Junheng Zhu\",\"doi\":\"10.1109/TCSVT.2025.3525734\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The continuous development of Earth observation (EO) technology has significantly increased the availability of multi-sensor remote sensing (RS) data. The fusion of hyperspectral image (HSI) and light detection and ranging (LiDAR) data has become a research hotspot. Current mainstream convolutional neural networks (CNNs) excel at extracting local features from images but have limitations in modeling global information, which may affect the performance of classification tasks. In contrast, modern graph convolutional networks (GCNs) excel at capturing global information, particularly demonstrating significant advantages when processing RS images with irregular topological structures. By integrating these two frameworks, features can be fused from multiple perspectives, enabling a more comprehensive capture of multimodal data attributes and improving classification performance. The paper proposes a spatial-spectral-structural feature fusion network (S3F2Net) for HSI and LiDAR data classification. S3F2Net utilizes multiple architectures to extract rich features of multimodal data from different perspectives. On one hand, local spatial and spectral features of multimodal data are extracted using CNN, enhancing interactions among heterogeneous data through shared-weight convolution to achieve detailed representations of land cover. On the other hand, the global topological structure is learned using GCN, which models the spatial relationships between land cover types through graph structure constructed from LiDAR data, thereby enhancing the model’s understanding of scene content. Furthermore, the dynamic node updating strategy within the GCN enhances the model’s ability to identify representative nodes for specific land cover types while facilitating information aggregation among remote nodes, thereby strengthening adaptability to complex topological structures. By employing a multi-level information fusion strategy to integrate data representations from both global and local perspectives, the accuracy and reliability of the results are ensured. Compared with state-of-the-art (SOTA) methods, the framework’s validity is verified on three real multimodal RS datasets. The source code will be available at <uri>https://github.com/slylnnu/S3F2Net</uri>.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 5\",\"pages\":\"4801-4815\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2025-01-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10824903/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10824903/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

随着对地观测技术的不断发展，多传感器遥感数据的可用性显著提高。高光谱图像（HSI）与激光雷达（LiDAR）数据的融合已成为研究热点。目前主流的卷积神经网络（cnn）擅长从图像中提取局部特征，但在建模全局信息方面存在局限性，这可能会影响分类任务的性能。相比之下，现代图卷积网络（GCNs）擅长捕获全局信息，特别是在处理具有不规则拓扑结构的RS图像时表现出显着的优势。通过集成这两个框架，可以从多个角度融合特征，从而能够更全面地捕获多模态数据属性并提高分类性能。提出了一种用于HSI和LiDAR数据分类的空间-光谱-结构特征融合网络（S3F2Net）。S3F2Net利用多种架构，从不同角度提取多模态数据的丰富特征。一方面，利用CNN提取多模态数据的局部空间和光谱特征，通过共享权卷积增强异构数据之间的交互作用，实现土地覆盖的详细表征；另一方面，使用GCN学习全局拓扑结构，GCN通过由LiDAR数据构建的图结构对土地覆盖类型之间的空间关系进行建模，从而增强模型对场景内容的理解。此外，GCN内的动态节点更新策略增强了模型识别特定土地覆盖类型代表节点的能力，促进了远程节点之间的信息聚合，从而增强了对复杂拓扑结构的适应性。通过采用多层次信息融合策略，从全局和局部角度整合数据表示，保证了结果的准确性和可靠性。与最先进的SOTA方法相比，在三个真实的多模态遥感数据集上验证了该框架的有效性。源代码可从https://github.com/slylnnu/S3F2Net获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

S3F2Net: Spatial-Spectral-Structural Feature Fusion Network for Hyperspectral Image and LiDAR Data Classification

The continuous development of Earth observation (EO) technology has significantly increased the availability of multi-sensor remote sensing (RS) data. The fusion of hyperspectral image (HSI) and light detection and ranging (LiDAR) data has become a research hotspot. Current mainstream convolutional neural networks (CNNs) excel at extracting local features from images but have limitations in modeling global information, which may affect the performance of classification tasks. In contrast, modern graph convolutional networks (GCNs) excel at capturing global information, particularly demonstrating significant advantages when processing RS images with irregular topological structures. By integrating these two frameworks, features can be fused from multiple perspectives, enabling a more comprehensive capture of multimodal data attributes and improving classification performance. The paper proposes a spatial-spectral-structural feature fusion network (S3F2Net) for HSI and LiDAR data classification. S3F2Net utilizes multiple architectures to extract rich features of multimodal data from different perspectives. On one hand, local spatial and spectral features of multimodal data are extracted using CNN, enhancing interactions among heterogeneous data through shared-weight convolution to achieve detailed representations of land cover. On the other hand, the global topological structure is learned using GCN, which models the spatial relationships between land cover types through graph structure constructed from LiDAR data, thereby enhancing the model’s understanding of scene content. Furthermore, the dynamic node updating strategy within the GCN enhances the model’s ability to identify representative nodes for specific land cover types while facilitating information aggregation among remote nodes, thereby strengthening adaptability to complex topological structures. By employing a multi-level information fusion strategy to integrate data representations from both global and local perspectives, the accuracy and reliability of the results are ensured. Compared with state-of-the-art (SOTA) methods, the framework’s validity is verified on three real multimodal RS datasets. The source code will be available at https://github.com/slylnnu/S3F2Net.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.