{"title":"Hyperspectral image classification with token fusion on GPU","authors":"He Huang, Sha Tao","doi":"10.1016/j.cviu.2024.104198","DOIUrl":null,"url":null,"abstract":"<div><div>Hyperspectral images capture material nuances with spectral data, vital for remote sensing. Transformer has become a mainstream approach for tackling the challenges posed by high-dimensional hyperspectral data with complex structures. However, a major challenge they face when processing hyperspectral images is the presence of a large number of redundant tokens, which leads to a significant increase in computational load, adding to the model’s computational burden and affecting inference speed. Therefore, we propose a token fusion algorithm tailored to the operational characteristics of the hyperspectral image and pure transformer network, aimed at enhancing the final accuracy and throughput of the model. The token fusion algorithm introduces a token merging step between the attention mechanism and the multi-layer perceptron module in each Transformer layer. Experiments on four hyperspectral image datasets demonstrate that our token fusion algorithm can significantly improve inference speed without any training, while only causing a slight decrease in the pure transformer network’s classification accuracy.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224002790","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Hyperspectral images capture material nuances with spectral data, vital for remote sensing. Transformer has become a mainstream approach for tackling the challenges posed by high-dimensional hyperspectral data with complex structures. However, a major challenge they face when processing hyperspectral images is the presence of a large number of redundant tokens, which leads to a significant increase in computational load, adding to the model’s computational burden and affecting inference speed. Therefore, we propose a token fusion algorithm tailored to the operational characteristics of the hyperspectral image and pure transformer network, aimed at enhancing the final accuracy and throughput of the model. The token fusion algorithm introduces a token merging step between the attention mechanism and the multi-layer perceptron module in each Transformer layer. Experiments on four hyperspectral image datasets demonstrate that our token fusion algorithm can significantly improve inference speed without any training, while only causing a slight decrease in the pure transformer network’s classification accuracy.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems