A method based on hybrid cross-multiscale spectral-spatial transformer network for hyperspectral and multispectral image fusion

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yingxia Chen , Mingming Wei , Yan Chen
{"title":"A method based on hybrid cross-multiscale spectral-spatial transformer network for hyperspectral and multispectral image fusion","authors":"Yingxia Chen ,&nbsp;Mingming Wei ,&nbsp;Yan Chen","doi":"10.1016/j.eswa.2024.125742","DOIUrl":null,"url":null,"abstract":"<div><div>Convolutional neural networks (CNNs) have made a significant contribution to hyperspectral image (HSI) generation. However, capturing long-range dependencies can be challenging with CNNs due to the limitations of their local receptive fields, which can lead to distortions in fused images. Transformers excel at capturing long-range dependencies but have limited capacity for handling fine details. Additionally, prior<!--> <!-->work has often overlooked the extraction of global features during the image preprocessing stage, resulting in the potential loss of fine details. To address these issues, we propose a hybrid cross-multiscale spectral-spatial Transformer (HCMSST) that combines the advantages of CNNs in feature extraction and Transformers in capturing long-range dependencies. To fully extract and retain local and global information in the shallow feature extraction phase, the network incorporates<!--> <!-->CNNs with a staggered cascade-dense residual block (SCDRB). This block employs staggered residuals to establish direct connections both<!--> <!-->within and between branches and integrates attention modules to enhance the response to important features. This approach facilitates unrestricted information exchange and fosters deeper feature representations. To address the limitations<!--> <!-->of Transformer in processing fine details, we introduce multiscale spatial-spectral coding-decoding structures to obtain comprehensive spatial-spectral features, which are utilized to capture the long-range dependencies via the cross-multiscale spectral-spatial Transformer (CMSST). Further, the CMSST incorporates a cross-level dual-stream feature interaction strategy that integrates spatial and spectral features from different levels and then feeds the fused features back to their corresponding branches for information interaction. Experimental results indicate that the proposed HCMSST achieves superior performance compared to many state-of-the-art (SOTA) methods. Specifically, HCMSST reduces the ERGAS metric by 3.05% compared to the SOTA methods on the CAVE dataset, while on the Harvard dataset, it achieves a 2.69% reduction in ERGAS compared to the SOTA results.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"263 ","pages":"Article 125742"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424026095","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Convolutional neural networks (CNNs) have made a significant contribution to hyperspectral image (HSI) generation. However, capturing long-range dependencies can be challenging with CNNs due to the limitations of their local receptive fields, which can lead to distortions in fused images. Transformers excel at capturing long-range dependencies but have limited capacity for handling fine details. Additionally, prior work has often overlooked the extraction of global features during the image preprocessing stage, resulting in the potential loss of fine details. To address these issues, we propose a hybrid cross-multiscale spectral-spatial Transformer (HCMSST) that combines the advantages of CNNs in feature extraction and Transformers in capturing long-range dependencies. To fully extract and retain local and global information in the shallow feature extraction phase, the network incorporates CNNs with a staggered cascade-dense residual block (SCDRB). This block employs staggered residuals to establish direct connections both within and between branches and integrates attention modules to enhance the response to important features. This approach facilitates unrestricted information exchange and fosters deeper feature representations. To address the limitations of Transformer in processing fine details, we introduce multiscale spatial-spectral coding-decoding structures to obtain comprehensive spatial-spectral features, which are utilized to capture the long-range dependencies via the cross-multiscale spectral-spatial Transformer (CMSST). Further, the CMSST incorporates a cross-level dual-stream feature interaction strategy that integrates spatial and spectral features from different levels and then feeds the fused features back to their corresponding branches for information interaction. Experimental results indicate that the proposed HCMSST achieves superior performance compared to many state-of-the-art (SOTA) methods. Specifically, HCMSST reduces the ERGAS metric by 3.05% compared to the SOTA methods on the CAVE dataset, while on the Harvard dataset, it achieves a 2.69% reduction in ERGAS compared to the SOTA results.
基于跨多尺度光谱-空间变换器混合网络的高光谱和多光谱图像融合方法
卷积神经网络(CNN)为高光谱图像(HSI)生成做出了重大贡献。然而,由于其局部感受野的限制,使用卷积神经网络捕捉远距离相关性可能具有挑战性,这可能导致融合图像失真。变换器擅长捕捉长距离依赖关系,但处理精细细节的能力有限。此外,之前的工作往往忽略了在图像预处理阶段提取全局特征,从而可能导致精细细节的丢失。为了解决这些问题,我们提出了一种混合跨多尺度光谱空间变换器(HCMSST),它结合了 CNN 在特征提取方面的优势和变换器在捕捉长距离相关性方面的优势。为了在浅层特征提取阶段充分提取并保留局部和全局信息,该网络结合了带有交错级联密集残差块(SCDRB)的 CNN。该块采用交错残差,在分支内部和分支之间建立直接连接,并集成注意力模块,以增强对重要特征的响应。这种方法有利于无限制的信息交换,并促进更深入的特征表征。为了解决变换器在处理精细细节方面的局限性,我们引入了多尺度空间-光谱编码-解码结构,以获得全面的空间-光谱特征,并通过跨多尺度光谱-空间变换器(CMSST)利用这些特征捕捉长程依赖关系。此外,CMSST 还采用了跨级别双流特征交互策略,将来自不同级别的空间和频谱特征整合在一起,然后将融合后的特征反馈给相应的分支机构进行信息交互。实验结果表明,与许多最先进的(SOTA)方法相比,所提出的 HCMSST 实现了更优越的性能。具体来说,在 CAVE 数据集上,与 SOTA 方法相比,HCMSST 的 ERGAS 指标降低了 3.05%;而在哈佛数据集上,与 SOTA 方法相比,HCMSST 的 ERGAS 指标降低了 2.69%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Expert Systems with Applications
Expert Systems with Applications 工程技术-工程:电子与电气
CiteScore
13.80
自引率
10.60%
发文量
2045
审稿时长
8.7 months
期刊介绍: Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信