具有全局-局部对准注意的非对称Siamese变压器用于可见x射线交叉模态封装再识别

IF 8 1区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS
Yonggan Wu;Hongrui Yuan;Zichao Yuan;Lingchao Meng;Yueyi Bai;Hongqiang Wang
{"title":"具有全局-局部对准注意的非对称Siamese变压器用于可见x射线交叉模态封装再识别","authors":"Yonggan Wu;Hongrui Yuan;Zichao Yuan;Lingchao Meng;Yueyi Bai;Hongqiang Wang","doi":"10.1109/TIFS.2025.3592540","DOIUrl":null,"url":null,"abstract":"Visible-X-ray Cross-Modality Package Re-Identification (VX-ReID) is a critical task in security inspection, aiming to match visible-light images with X-ray images. The significant modality gap between these image types poses substantial challenges in extracting robust and fine-grained modality-invariant features. To effectively address these challenges, this paper introduces a novel cross-modality feature extraction framework, the Asymmetric Siamese Transformer with Global-Local Alignment Attention (AST-GLAA). The network comprises two key components: Cross-modality Asymmetric Siamese Transformer Structure (CAST-S) and Global-Local Cross-Modality Alignment Attention (GL-CMA). CAST-S leverages an asymmetric design in one branch of the Siamese Transformer network by introducing a LayerNorm layer and incorporating modality embeddings to enhance the robustness of modality-invariant features. Meanwhile, GL-CMA facilitates the interaction between global and local features, significantly improving the representation of fine-grained features while effectively addressing spatial misalignment issues in cross-modality images. Experimental results demonstrate that the proposed method achieves state-of-the-art (SOTA) performance on the VX-ReID task, highlighting its effectiveness and potential in addressing the challenges of cross-modality package re-identification.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"7881-7894"},"PeriodicalIF":8.0000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Asymmetric Siamese Transformer With Global-Local Alignment Attention for Visible-X-Ray Cross-Modality Package Re-Identification\",\"authors\":\"Yonggan Wu;Hongrui Yuan;Zichao Yuan;Lingchao Meng;Yueyi Bai;Hongqiang Wang\",\"doi\":\"10.1109/TIFS.2025.3592540\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visible-X-ray Cross-Modality Package Re-Identification (VX-ReID) is a critical task in security inspection, aiming to match visible-light images with X-ray images. The significant modality gap between these image types poses substantial challenges in extracting robust and fine-grained modality-invariant features. To effectively address these challenges, this paper introduces a novel cross-modality feature extraction framework, the Asymmetric Siamese Transformer with Global-Local Alignment Attention (AST-GLAA). The network comprises two key components: Cross-modality Asymmetric Siamese Transformer Structure (CAST-S) and Global-Local Cross-Modality Alignment Attention (GL-CMA). CAST-S leverages an asymmetric design in one branch of the Siamese Transformer network by introducing a LayerNorm layer and incorporating modality embeddings to enhance the robustness of modality-invariant features. Meanwhile, GL-CMA facilitates the interaction between global and local features, significantly improving the representation of fine-grained features while effectively addressing spatial misalignment issues in cross-modality images. Experimental results demonstrate that the proposed method achieves state-of-the-art (SOTA) performance on the VX-ReID task, highlighting its effectiveness and potential in addressing the challenges of cross-modality package re-identification.\",\"PeriodicalId\":13492,\"journal\":{\"name\":\"IEEE Transactions on Information Forensics and Security\",\"volume\":\"20 \",\"pages\":\"7881-7894\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Information Forensics and Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11095748/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11095748/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

摘要

可见光- x射线交叉模态包再识别(VX-ReID)是安检中的一项关键任务,旨在将可见光图像与x射线图像进行匹配。这些图像类型之间的显着模态差距对提取鲁棒和细粒度模态不变特征提出了重大挑战。为了有效地解决这些问题,本文引入了一种新的跨模态特征提取框架,即具有全局-局部对齐注意(AST-GLAA)的非对称暹罗变压器。该网络包括两个关键组成部分:跨模态不对称连体变压器结构(CAST-S)和全局-局部跨模态对齐注意(GL-CMA)。CAST-S通过引入LayerNorm层并结合模态嵌入来增强模态不变特征的鲁棒性,从而在Siamese Transformer网络的一个分支中利用了不对称设计。同时,GL-CMA促进了全局和局部特征之间的交互,显著提高了细粒度特征的表示,同时有效地解决了跨模态图像的空间错位问题。实验结果表明,该方法在VX-ReID任务中达到了最先进(SOTA)的性能,突出了其在解决跨模态包重新识别挑战方面的有效性和潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Asymmetric Siamese Transformer With Global-Local Alignment Attention for Visible-X-Ray Cross-Modality Package Re-Identification
Visible-X-ray Cross-Modality Package Re-Identification (VX-ReID) is a critical task in security inspection, aiming to match visible-light images with X-ray images. The significant modality gap between these image types poses substantial challenges in extracting robust and fine-grained modality-invariant features. To effectively address these challenges, this paper introduces a novel cross-modality feature extraction framework, the Asymmetric Siamese Transformer with Global-Local Alignment Attention (AST-GLAA). The network comprises two key components: Cross-modality Asymmetric Siamese Transformer Structure (CAST-S) and Global-Local Cross-Modality Alignment Attention (GL-CMA). CAST-S leverages an asymmetric design in one branch of the Siamese Transformer network by introducing a LayerNorm layer and incorporating modality embeddings to enhance the robustness of modality-invariant features. Meanwhile, GL-CMA facilitates the interaction between global and local features, significantly improving the representation of fine-grained features while effectively addressing spatial misalignment issues in cross-modality images. Experimental results demonstrate that the proposed method achieves state-of-the-art (SOTA) performance on the VX-ReID task, highlighting its effectiveness and potential in addressing the challenges of cross-modality package re-identification.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Information Forensics and Security
IEEE Transactions on Information Forensics and Security 工程技术-工程:电子与电气
CiteScore
14.40
自引率
7.40%
发文量
234
审稿时长
6.5 months
期刊介绍: The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信