具有全局-局部对准注意的非对称Siamese变压器用于可见x射线交叉模态封装再识别

IF 8 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

IEEE Transactions on Information Forensics and Security Pub Date : 2025-07-24 DOI:10.1109/TIFS.2025.3592540

Yonggan Wu;Hongrui Yuan;Zichao Yuan;Lingchao Meng;Yueyi Bai;Hongqiang Wang

{"title":"具有全局-局部对准注意的非对称Siamese变压器用于可见x射线交叉模态封装再识别","authors":"Yonggan Wu;Hongrui Yuan;Zichao Yuan;Lingchao Meng;Yueyi Bai;Hongqiang Wang","doi":"10.1109/TIFS.2025.3592540","DOIUrl":null,"url":null,"abstract":"Visible-X-ray Cross-Modality Package Re-Identification (VX-ReID) is a critical task in security inspection, aiming to match visible-light images with X-ray images. The significant modality gap between these image types poses substantial challenges in extracting robust and fine-grained modality-invariant features. To effectively address these challenges, this paper introduces a novel cross-modality feature extraction framework, the Asymmetric Siamese Transformer with Global-Local Alignment Attention (AST-GLAA). The network comprises two key components: Cross-modality Asymmetric Siamese Transformer Structure (CAST-S) and Global-Local Cross-Modality Alignment Attention (GL-CMA). CAST-S leverages an asymmetric design in one branch of the Siamese Transformer network by introducing a LayerNorm layer and incorporating modality embeddings to enhance the robustness of modality-invariant features. Meanwhile, GL-CMA facilitates the interaction between global and local features, significantly improving the representation of fine-grained features while effectively addressing spatial misalignment issues in cross-modality images. Experimental results demonstrate that the proposed method achieves state-of-the-art (SOTA) performance on the VX-ReID task, highlighting its effectiveness and potential in addressing the challenges of cross-modality package re-identification.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"7881-7894"},"PeriodicalIF":8.0000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Asymmetric Siamese Transformer With Global-Local Alignment Attention for Visible-X-Ray Cross-Modality Package Re-Identification\",\"authors\":\"Yonggan Wu;Hongrui Yuan;Zichao Yuan;Lingchao Meng;Yueyi Bai;Hongqiang Wang\",\"doi\":\"10.1109/TIFS.2025.3592540\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visible-X-ray Cross-Modality Package Re-Identification (VX-ReID) is a critical task in security inspection, aiming to match visible-light images with X-ray images. The significant modality gap between these image types poses substantial challenges in extracting robust and fine-grained modality-invariant features. To effectively address these challenges, this paper introduces a novel cross-modality feature extraction framework, the Asymmetric Siamese Transformer with Global-Local Alignment Attention (AST-GLAA). The network comprises two key components: Cross-modality Asymmetric Siamese Transformer Structure (CAST-S) and Global-Local Cross-Modality Alignment Attention (GL-CMA). CAST-S leverages an asymmetric design in one branch of the Siamese Transformer network by introducing a LayerNorm layer and incorporating modality embeddings to enhance the robustness of modality-invariant features. Meanwhile, GL-CMA facilitates the interaction between global and local features, significantly improving the representation of fine-grained features while effectively addressing spatial misalignment issues in cross-modality images. Experimental results demonstrate that the proposed method achieves state-of-the-art (SOTA) performance on the VX-ReID task, highlighting its effectiveness and potential in addressing the challenges of cross-modality package re-identification.\",\"PeriodicalId\":13492,\"journal\":{\"name\":\"IEEE Transactions on Information Forensics and Security\",\"volume\":\"20 \",\"pages\":\"7881-7894\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Information Forensics and Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11095748/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11095748/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

可见光- x射线交叉模态包再识别（VX-ReID）是安检中的一项关键任务，旨在将可见光图像与x射线图像进行匹配。这些图像类型之间的显着模态差距对提取鲁棒和细粒度模态不变特征提出了重大挑战。为了有效地解决这些问题，本文引入了一种新的跨模态特征提取框架，即具有全局-局部对齐注意（AST-GLAA）的非对称暹罗变压器。该网络包括两个关键组成部分：跨模态不对称连体变压器结构（CAST-S）和全局-局部跨模态对齐注意（GL-CMA）。CAST-S通过引入LayerNorm层并结合模态嵌入来增强模态不变特征的鲁棒性，从而在Siamese Transformer网络的一个分支中利用了不对称设计。同时，GL-CMA促进了全局和局部特征之间的交互，显著提高了细粒度特征的表示，同时有效地解决了跨模态图像的空间错位问题。实验结果表明，该方法在VX-ReID任务中达到了最先进（SOTA）的性能，突出了其在解决跨模态包重新识别挑战方面的有效性和潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Asymmetric Siamese Transformer With Global-Local Alignment Attention for Visible-X-Ray Cross-Modality Package Re-Identification

Visible-X-ray Cross-Modality Package Re-Identification (VX-ReID) is a critical task in security inspection, aiming to match visible-light images with X-ray images. The significant modality gap between these image types poses substantial challenges in extracting robust and fine-grained modality-invariant features. To effectively address these challenges, this paper introduces a novel cross-modality feature extraction framework, the Asymmetric Siamese Transformer with Global-Local Alignment Attention (AST-GLAA). The network comprises two key components: Cross-modality Asymmetric Siamese Transformer Structure (CAST-S) and Global-Local Cross-Modality Alignment Attention (GL-CMA). CAST-S leverages an asymmetric design in one branch of the Siamese Transformer network by introducing a LayerNorm layer and incorporating modality embeddings to enhance the robustness of modality-invariant features. Meanwhile, GL-CMA facilitates the interaction between global and local features, significantly improving the representation of fine-grained features while effectively addressing spatial misalignment issues in cross-modality images. Experimental results demonstrate that the proposed method achieves state-of-the-art (SOTA) performance on the VX-ReID task, highlighting its effectiveness and potential in addressing the challenges of cross-modality package re-identification.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Information Forensics and Security 工程技术-工程：电子与电气

CiteScore

14.40

自引率

7.40%

发文量

234

审稿时长

6.5 months

期刊介绍： The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features