{"title":"S3F2Net:高光谱图像与激光雷达数据分类的空间-光谱-结构特征融合网络","authors":"Xianghai Wang;Liyang Song;Yining Feng;Junheng Zhu","doi":"10.1109/TCSVT.2025.3525734","DOIUrl":null,"url":null,"abstract":"The continuous development of Earth observation (EO) technology has significantly increased the availability of multi-sensor remote sensing (RS) data. The fusion of hyperspectral image (HSI) and light detection and ranging (LiDAR) data has become a research hotspot. Current mainstream convolutional neural networks (CNNs) excel at extracting local features from images but have limitations in modeling global information, which may affect the performance of classification tasks. In contrast, modern graph convolutional networks (GCNs) excel at capturing global information, particularly demonstrating significant advantages when processing RS images with irregular topological structures. By integrating these two frameworks, features can be fused from multiple perspectives, enabling a more comprehensive capture of multimodal data attributes and improving classification performance. The paper proposes a spatial-spectral-structural feature fusion network (S3F2Net) for HSI and LiDAR data classification. S3F2Net utilizes multiple architectures to extract rich features of multimodal data from different perspectives. On one hand, local spatial and spectral features of multimodal data are extracted using CNN, enhancing interactions among heterogeneous data through shared-weight convolution to achieve detailed representations of land cover. On the other hand, the global topological structure is learned using GCN, which models the spatial relationships between land cover types through graph structure constructed from LiDAR data, thereby enhancing the model’s understanding of scene content. Furthermore, the dynamic node updating strategy within the GCN enhances the model’s ability to identify representative nodes for specific land cover types while facilitating information aggregation among remote nodes, thereby strengthening adaptability to complex topological structures. By employing a multi-level information fusion strategy to integrate data representations from both global and local perspectives, the accuracy and reliability of the results are ensured. Compared with state-of-the-art (SOTA) methods, the framework’s validity is verified on three real multimodal RS datasets. The source code will be available at <uri>https://github.com/slylnnu/S3F2Net</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4801-4815"},"PeriodicalIF":8.3000,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"S3F2Net: Spatial-Spectral-Structural Feature Fusion Network for Hyperspectral Image and LiDAR Data Classification\",\"authors\":\"Xianghai Wang;Liyang Song;Yining Feng;Junheng Zhu\",\"doi\":\"10.1109/TCSVT.2025.3525734\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The continuous development of Earth observation (EO) technology has significantly increased the availability of multi-sensor remote sensing (RS) data. The fusion of hyperspectral image (HSI) and light detection and ranging (LiDAR) data has become a research hotspot. Current mainstream convolutional neural networks (CNNs) excel at extracting local features from images but have limitations in modeling global information, which may affect the performance of classification tasks. In contrast, modern graph convolutional networks (GCNs) excel at capturing global information, particularly demonstrating significant advantages when processing RS images with irregular topological structures. By integrating these two frameworks, features can be fused from multiple perspectives, enabling a more comprehensive capture of multimodal data attributes and improving classification performance. The paper proposes a spatial-spectral-structural feature fusion network (S3F2Net) for HSI and LiDAR data classification. S3F2Net utilizes multiple architectures to extract rich features of multimodal data from different perspectives. On one hand, local spatial and spectral features of multimodal data are extracted using CNN, enhancing interactions among heterogeneous data through shared-weight convolution to achieve detailed representations of land cover. On the other hand, the global topological structure is learned using GCN, which models the spatial relationships between land cover types through graph structure constructed from LiDAR data, thereby enhancing the model’s understanding of scene content. Furthermore, the dynamic node updating strategy within the GCN enhances the model’s ability to identify representative nodes for specific land cover types while facilitating information aggregation among remote nodes, thereby strengthening adaptability to complex topological structures. By employing a multi-level information fusion strategy to integrate data representations from both global and local perspectives, the accuracy and reliability of the results are ensured. Compared with state-of-the-art (SOTA) methods, the framework’s validity is verified on three real multimodal RS datasets. The source code will be available at <uri>https://github.com/slylnnu/S3F2Net</uri>.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 5\",\"pages\":\"4801-4815\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2025-01-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10824903/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10824903/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
S3F2Net: Spatial-Spectral-Structural Feature Fusion Network for Hyperspectral Image and LiDAR Data Classification
The continuous development of Earth observation (EO) technology has significantly increased the availability of multi-sensor remote sensing (RS) data. The fusion of hyperspectral image (HSI) and light detection and ranging (LiDAR) data has become a research hotspot. Current mainstream convolutional neural networks (CNNs) excel at extracting local features from images but have limitations in modeling global information, which may affect the performance of classification tasks. In contrast, modern graph convolutional networks (GCNs) excel at capturing global information, particularly demonstrating significant advantages when processing RS images with irregular topological structures. By integrating these two frameworks, features can be fused from multiple perspectives, enabling a more comprehensive capture of multimodal data attributes and improving classification performance. The paper proposes a spatial-spectral-structural feature fusion network (S3F2Net) for HSI and LiDAR data classification. S3F2Net utilizes multiple architectures to extract rich features of multimodal data from different perspectives. On one hand, local spatial and spectral features of multimodal data are extracted using CNN, enhancing interactions among heterogeneous data through shared-weight convolution to achieve detailed representations of land cover. On the other hand, the global topological structure is learned using GCN, which models the spatial relationships between land cover types through graph structure constructed from LiDAR data, thereby enhancing the model’s understanding of scene content. Furthermore, the dynamic node updating strategy within the GCN enhances the model’s ability to identify representative nodes for specific land cover types while facilitating information aggregation among remote nodes, thereby strengthening adaptability to complex topological structures. By employing a multi-level information fusion strategy to integrate data representations from both global and local perspectives, the accuracy and reliability of the results are ensured. Compared with state-of-the-art (SOTA) methods, the framework’s validity is verified on three real multimodal RS datasets. The source code will be available at https://github.com/slylnnu/S3F2Net.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.