{"title":"A method based on hybrid cross-multiscale spectral-spatial transformer network for hyperspectral and multispectral image fusion","authors":"Yingxia Chen , Mingming Wei , Yan Chen","doi":"10.1016/j.eswa.2024.125742","DOIUrl":null,"url":null,"abstract":"<div><div>Convolutional neural networks (CNNs) have made a significant contribution to hyperspectral image (HSI) generation. However, capturing long-range dependencies can be challenging with CNNs due to the limitations of their local receptive fields, which can lead to distortions in fused images. Transformers excel at capturing long-range dependencies but have limited capacity for handling fine details. Additionally, prior<!--> <!-->work has often overlooked the extraction of global features during the image preprocessing stage, resulting in the potential loss of fine details. To address these issues, we propose a hybrid cross-multiscale spectral-spatial Transformer (HCMSST) that combines the advantages of CNNs in feature extraction and Transformers in capturing long-range dependencies. To fully extract and retain local and global information in the shallow feature extraction phase, the network incorporates<!--> <!-->CNNs with a staggered cascade-dense residual block (SCDRB). This block employs staggered residuals to establish direct connections both<!--> <!-->within and between branches and integrates attention modules to enhance the response to important features. This approach facilitates unrestricted information exchange and fosters deeper feature representations. To address the limitations<!--> <!-->of Transformer in processing fine details, we introduce multiscale spatial-spectral coding-decoding structures to obtain comprehensive spatial-spectral features, which are utilized to capture the long-range dependencies via the cross-multiscale spectral-spatial Transformer (CMSST). Further, the CMSST incorporates a cross-level dual-stream feature interaction strategy that integrates spatial and spectral features from different levels and then feeds the fused features back to their corresponding branches for information interaction. Experimental results indicate that the proposed HCMSST achieves superior performance compared to many state-of-the-art (SOTA) methods. Specifically, HCMSST reduces the ERGAS metric by 3.05% compared to the SOTA methods on the CAVE dataset, while on the Harvard dataset, it achieves a 2.69% reduction in ERGAS compared to the SOTA results.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"263 ","pages":"Article 125742"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424026095","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Convolutional neural networks (CNNs) have made a significant contribution to hyperspectral image (HSI) generation. However, capturing long-range dependencies can be challenging with CNNs due to the limitations of their local receptive fields, which can lead to distortions in fused images. Transformers excel at capturing long-range dependencies but have limited capacity for handling fine details. Additionally, prior work has often overlooked the extraction of global features during the image preprocessing stage, resulting in the potential loss of fine details. To address these issues, we propose a hybrid cross-multiscale spectral-spatial Transformer (HCMSST) that combines the advantages of CNNs in feature extraction and Transformers in capturing long-range dependencies. To fully extract and retain local and global information in the shallow feature extraction phase, the network incorporates CNNs with a staggered cascade-dense residual block (SCDRB). This block employs staggered residuals to establish direct connections both within and between branches and integrates attention modules to enhance the response to important features. This approach facilitates unrestricted information exchange and fosters deeper feature representations. To address the limitations of Transformer in processing fine details, we introduce multiscale spatial-spectral coding-decoding structures to obtain comprehensive spatial-spectral features, which are utilized to capture the long-range dependencies via the cross-multiscale spectral-spatial Transformer (CMSST). Further, the CMSST incorporates a cross-level dual-stream feature interaction strategy that integrates spatial and spectral features from different levels and then feeds the fused features back to their corresponding branches for information interaction. Experimental results indicate that the proposed HCMSST achieves superior performance compared to many state-of-the-art (SOTA) methods. Specifically, HCMSST reduces the ERGAS metric by 3.05% compared to the SOTA methods on the CAVE dataset, while on the Harvard dataset, it achieves a 2.69% reduction in ERGAS compared to the SOTA results.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.