{"title":"用于 HSI 和激光雷达数据融合分类的多级注意动态尺度网络","authors":"Yufei He;Bobo Xi;Guocheng Li;Tie Zheng;Yunsong Li;Changbin Xue;Jocelyn Chanussot","doi":"10.1109/TGRS.2024.3456754","DOIUrl":null,"url":null,"abstract":"Land use/land cover classification with multimodal data has attracted increasing attention. For hyperspectral images (HSIs) and light detection and ranging (LiDAR) data, the combination of them can make the classification more accurate and robust. However, how to effectively utilize their respective strengths and integrate them with the classification task is still a challenging problem. In this article, a multilevel attention dynamic-scale network (MADNet) is proposed. First, in the feature extraction stage, the two modalities are divided into two branches with different scales, which are then fed into the convolutional neural networks (CNNs) to learn shallow features. Then, considering the characteristics of the HSI, a spectral angle attention module (SAAM) with low-level attention is designed to highlight surrounding pixels that have similar spectra to the central pixel of the patch. After that, a dynamic-scale selection module (DSSM) is proposed to screen an appropriate scale for the patches by pixel similarity analysis. Next, combining the Transformer and the CNN, a global-local cross-attention module (GLCAM) is devised to investigate the fused deep-level multimodal features. Distinct from the vanilla Transformer, the GLCAM deploys a distance-weight operator to decrease the redundancies at long distances and effectively reduce misclassifications. Extensive experiments on three paired HSI and LiDAR datasets demonstrate that the proposed MADNet has certain advantages over the existing methods.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":null,"pages":null},"PeriodicalIF":7.5000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multilevel Attention Dynamic-Scale Network for HSI and LiDAR Data Fusion Classification\",\"authors\":\"Yufei He;Bobo Xi;Guocheng Li;Tie Zheng;Yunsong Li;Changbin Xue;Jocelyn Chanussot\",\"doi\":\"10.1109/TGRS.2024.3456754\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Land use/land cover classification with multimodal data has attracted increasing attention. For hyperspectral images (HSIs) and light detection and ranging (LiDAR) data, the combination of them can make the classification more accurate and robust. However, how to effectively utilize their respective strengths and integrate them with the classification task is still a challenging problem. In this article, a multilevel attention dynamic-scale network (MADNet) is proposed. First, in the feature extraction stage, the two modalities are divided into two branches with different scales, which are then fed into the convolutional neural networks (CNNs) to learn shallow features. Then, considering the characteristics of the HSI, a spectral angle attention module (SAAM) with low-level attention is designed to highlight surrounding pixels that have similar spectra to the central pixel of the patch. After that, a dynamic-scale selection module (DSSM) is proposed to screen an appropriate scale for the patches by pixel similarity analysis. Next, combining the Transformer and the CNN, a global-local cross-attention module (GLCAM) is devised to investigate the fused deep-level multimodal features. Distinct from the vanilla Transformer, the GLCAM deploys a distance-weight operator to decrease the redundancies at long distances and effectively reduce misclassifications. Extensive experiments on three paired HSI and LiDAR datasets demonstrate that the proposed MADNet has certain advantages over the existing methods.\",\"PeriodicalId\":13213,\"journal\":{\"name\":\"IEEE Transactions on Geoscience and Remote Sensing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Geoscience and Remote Sensing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10669994/\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10669994/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Multilevel Attention Dynamic-Scale Network for HSI and LiDAR Data Fusion Classification
Land use/land cover classification with multimodal data has attracted increasing attention. For hyperspectral images (HSIs) and light detection and ranging (LiDAR) data, the combination of them can make the classification more accurate and robust. However, how to effectively utilize their respective strengths and integrate them with the classification task is still a challenging problem. In this article, a multilevel attention dynamic-scale network (MADNet) is proposed. First, in the feature extraction stage, the two modalities are divided into two branches with different scales, which are then fed into the convolutional neural networks (CNNs) to learn shallow features. Then, considering the characteristics of the HSI, a spectral angle attention module (SAAM) with low-level attention is designed to highlight surrounding pixels that have similar spectra to the central pixel of the patch. After that, a dynamic-scale selection module (DSSM) is proposed to screen an appropriate scale for the patches by pixel similarity analysis. Next, combining the Transformer and the CNN, a global-local cross-attention module (GLCAM) is devised to investigate the fused deep-level multimodal features. Distinct from the vanilla Transformer, the GLCAM deploys a distance-weight operator to decrease the redundancies at long distances and effectively reduce misclassifications. Extensive experiments on three paired HSI and LiDAR datasets demonstrate that the proposed MADNet has certain advantages over the existing methods.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.