{"title":"A Hierarchical Graph-Enhanced Transformer Network for Remote Sensing Scene Classification","authors":"Ziwei Li;Weiming Xu;Shiyu Yang;Juan Wang;Hua Su;Zhanchao Huang;Sheng Wu","doi":"10.1109/JSTARS.2024.3491335","DOIUrl":null,"url":null,"abstract":"Remote sensing scene classification (RSSC) is essential in Earth observation, with applications in land use, environmental status, urban development, and disaster risk assessment. However, redundant background interference, varying feature scales, and high interclass similarity in remote sensing images present significant challenges for RSSC. To address these challenges, this article proposes a novel hierarchical graph-enhanced transformer network (HGTNet) for RSSC. Initially, we introduce a dual attention (DA) module, which extracts key feature information from both the channel and spatial domains, effectively suppressing background noise. Subsequently, we meticulously design a three-stage hierarchical transformer extractor, incorporating a DA module at the bottleneck of each stage to facilitate information exchange between different stages, in conjunction with the Swin transformer block to capture multiscale global visual information. Moreover, we develop a fine-grained graph neural network extractor that constructs the spatial topological relationships of pixel-level scene images, thereby aiding in the discrimination of similar complex scene categories. Finally, the visual features and spatial structural features are fully integrated and input into the classifier by employing skip connections. HGTNet achieves classification accuracies of 98.47%, 95.75%, and 96.33% on the aerial image, NWPU-RESISC45, and OPTIMAL-31 datasets, respectively, demonstrating superior performance compared to other state-of-the-art models. Extensive experimental results indicate that our proposed method effectively learns critical multiscale visual features and distinguishes between similar complex scenes, thereby significantly enhancing the accuracy of RSSC.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"17 ","pages":"20315-20330"},"PeriodicalIF":4.7000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10742489","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10742489/","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Remote sensing scene classification (RSSC) is essential in Earth observation, with applications in land use, environmental status, urban development, and disaster risk assessment. However, redundant background interference, varying feature scales, and high interclass similarity in remote sensing images present significant challenges for RSSC. To address these challenges, this article proposes a novel hierarchical graph-enhanced transformer network (HGTNet) for RSSC. Initially, we introduce a dual attention (DA) module, which extracts key feature information from both the channel and spatial domains, effectively suppressing background noise. Subsequently, we meticulously design a three-stage hierarchical transformer extractor, incorporating a DA module at the bottleneck of each stage to facilitate information exchange between different stages, in conjunction with the Swin transformer block to capture multiscale global visual information. Moreover, we develop a fine-grained graph neural network extractor that constructs the spatial topological relationships of pixel-level scene images, thereby aiding in the discrimination of similar complex scene categories. Finally, the visual features and spatial structural features are fully integrated and input into the classifier by employing skip connections. HGTNet achieves classification accuracies of 98.47%, 95.75%, and 96.33% on the aerial image, NWPU-RESISC45, and OPTIMAL-31 datasets, respectively, demonstrating superior performance compared to other state-of-the-art models. Extensive experimental results indicate that our proposed method effectively learns critical multiscale visual features and distinguishes between similar complex scenes, thereby significantly enhancing the accuracy of RSSC.
期刊介绍:
The IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing addresses the growing field of applications in Earth observations and remote sensing, and also provides a venue for the rapidly expanding special issues that are being sponsored by the IEEE Geosciences and Remote Sensing Society. The journal draws upon the experience of the highly successful “IEEE Transactions on Geoscience and Remote Sensing” and provide a complementary medium for the wide range of topics in applied earth observations. The ‘Applications’ areas encompasses the societal benefit areas of the Global Earth Observations Systems of Systems (GEOSS) program. Through deliberations over two years, ministers from 50 countries agreed to identify nine areas where Earth observation could positively impact the quality of life and health of their respective countries. Some of these are areas not traditionally addressed in the IEEE context. These include biodiversity, health and climate. Yet it is the skill sets of IEEE members, in areas such as observations, communications, computers, signal processing, standards and ocean engineering, that form the technical underpinnings of GEOSS. Thus, the Journal attracts a broad range of interests that serves both present members in new ways and expands the IEEE visibility into new areas.