{"title":"PCET: Patch Confidence-Enhanced Transformer with efficient spectral–spatial features for hyperspectral image classification","authors":"Li Fang, Xuanli Lan, Tianyu Li, Huifang Shen","doi":"10.1016/j.jag.2024.104308","DOIUrl":null,"url":null,"abstract":"Hyperspectral image (HSI) classification based on deep learning has demonstrated promising performance. In general, using patch-wise samples helps to extract the spatial relationship between pixels and local contextual information. However, the presence of background or other category information in an image patch that is inconsistent with the central target category has a negative effect on classification. To solve this issue, a patch confidence-enhanced transformer (PCET) approach for HSI classification is proposed. To be specific, we design a patch quality assessment (PQA) branch model to evaluate the input patches during training process, which effectively filters out the intrusive non-central pixels. The output confidence of the branch model serves as a quantitative indicator of the contribution degree of the input patch to the overall training efficacy, which is subsequently weighted in the loss function, thereby endowing the model with the capability to dynamically adjust its learning focus based on the qualitative of the inputs. Second, a spectral–spatial multi-feature fusion (SSMF) module is devised to procure scores of representative information simultaneously and fully exploit the potential of multi-scale feature HSI data. Finally, to enhance feature discrimination, global context is efficiently modeled using the efficient additive attention transformer (<mml:math altimg=\"si4.svg\" display=\"inline\"><mml:mrow><mml:msup><mml:mrow><mml:mi mathvariant=\"normal\">EA</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mi mathvariant=\"normal\">T</mml:mi></mml:mrow></mml:math>) module, which streamlines the attention process and allows the model to learn efficient and robust global representations for accurate classification of the central pixel. A series of experimental results executed on real HSI datasets have substantiated that the proposed PCET can achieve outstanding performance, even when only 10 samples per category are used for training.","PeriodicalId":50341,"journal":{"name":"International Journal of Applied Earth Observation and Geoinformation","volume":"32 1","pages":""},"PeriodicalIF":7.5000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Applied Earth Observation and Geoinformation","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1016/j.jag.2024.104308","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Earth and Planetary Sciences","Score":null,"Total":0}
引用次数: 0
Abstract
Hyperspectral image (HSI) classification based on deep learning has demonstrated promising performance. In general, using patch-wise samples helps to extract the spatial relationship between pixels and local contextual information. However, the presence of background or other category information in an image patch that is inconsistent with the central target category has a negative effect on classification. To solve this issue, a patch confidence-enhanced transformer (PCET) approach for HSI classification is proposed. To be specific, we design a patch quality assessment (PQA) branch model to evaluate the input patches during training process, which effectively filters out the intrusive non-central pixels. The output confidence of the branch model serves as a quantitative indicator of the contribution degree of the input patch to the overall training efficacy, which is subsequently weighted in the loss function, thereby endowing the model with the capability to dynamically adjust its learning focus based on the qualitative of the inputs. Second, a spectral–spatial multi-feature fusion (SSMF) module is devised to procure scores of representative information simultaneously and fully exploit the potential of multi-scale feature HSI data. Finally, to enhance feature discrimination, global context is efficiently modeled using the efficient additive attention transformer (EA2T) module, which streamlines the attention process and allows the model to learn efficient and robust global representations for accurate classification of the central pixel. A series of experimental results executed on real HSI datasets have substantiated that the proposed PCET can achieve outstanding performance, even when only 10 samples per category are used for training.
期刊介绍:
The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.