Yonggan Wu;Hongrui Yuan;Zichao Yuan;Lingchao Meng;Yueyi Bai;Hongqiang Wang
{"title":"具有全局-局部对准注意的非对称Siamese变压器用于可见x射线交叉模态封装再识别","authors":"Yonggan Wu;Hongrui Yuan;Zichao Yuan;Lingchao Meng;Yueyi Bai;Hongqiang Wang","doi":"10.1109/TIFS.2025.3592540","DOIUrl":null,"url":null,"abstract":"Visible-X-ray Cross-Modality Package Re-Identification (VX-ReID) is a critical task in security inspection, aiming to match visible-light images with X-ray images. The significant modality gap between these image types poses substantial challenges in extracting robust and fine-grained modality-invariant features. To effectively address these challenges, this paper introduces a novel cross-modality feature extraction framework, the Asymmetric Siamese Transformer with Global-Local Alignment Attention (AST-GLAA). The network comprises two key components: Cross-modality Asymmetric Siamese Transformer Structure (CAST-S) and Global-Local Cross-Modality Alignment Attention (GL-CMA). CAST-S leverages an asymmetric design in one branch of the Siamese Transformer network by introducing a LayerNorm layer and incorporating modality embeddings to enhance the robustness of modality-invariant features. Meanwhile, GL-CMA facilitates the interaction between global and local features, significantly improving the representation of fine-grained features while effectively addressing spatial misalignment issues in cross-modality images. Experimental results demonstrate that the proposed method achieves state-of-the-art (SOTA) performance on the VX-ReID task, highlighting its effectiveness and potential in addressing the challenges of cross-modality package re-identification.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"7881-7894"},"PeriodicalIF":8.0000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Asymmetric Siamese Transformer With Global-Local Alignment Attention for Visible-X-Ray Cross-Modality Package Re-Identification\",\"authors\":\"Yonggan Wu;Hongrui Yuan;Zichao Yuan;Lingchao Meng;Yueyi Bai;Hongqiang Wang\",\"doi\":\"10.1109/TIFS.2025.3592540\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visible-X-ray Cross-Modality Package Re-Identification (VX-ReID) is a critical task in security inspection, aiming to match visible-light images with X-ray images. The significant modality gap between these image types poses substantial challenges in extracting robust and fine-grained modality-invariant features. To effectively address these challenges, this paper introduces a novel cross-modality feature extraction framework, the Asymmetric Siamese Transformer with Global-Local Alignment Attention (AST-GLAA). The network comprises two key components: Cross-modality Asymmetric Siamese Transformer Structure (CAST-S) and Global-Local Cross-Modality Alignment Attention (GL-CMA). CAST-S leverages an asymmetric design in one branch of the Siamese Transformer network by introducing a LayerNorm layer and incorporating modality embeddings to enhance the robustness of modality-invariant features. Meanwhile, GL-CMA facilitates the interaction between global and local features, significantly improving the representation of fine-grained features while effectively addressing spatial misalignment issues in cross-modality images. Experimental results demonstrate that the proposed method achieves state-of-the-art (SOTA) performance on the VX-ReID task, highlighting its effectiveness and potential in addressing the challenges of cross-modality package re-identification.\",\"PeriodicalId\":13492,\"journal\":{\"name\":\"IEEE Transactions on Information Forensics and Security\",\"volume\":\"20 \",\"pages\":\"7881-7894\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Information Forensics and Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11095748/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11095748/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
An Asymmetric Siamese Transformer With Global-Local Alignment Attention for Visible-X-Ray Cross-Modality Package Re-Identification
Visible-X-ray Cross-Modality Package Re-Identification (VX-ReID) is a critical task in security inspection, aiming to match visible-light images with X-ray images. The significant modality gap between these image types poses substantial challenges in extracting robust and fine-grained modality-invariant features. To effectively address these challenges, this paper introduces a novel cross-modality feature extraction framework, the Asymmetric Siamese Transformer with Global-Local Alignment Attention (AST-GLAA). The network comprises two key components: Cross-modality Asymmetric Siamese Transformer Structure (CAST-S) and Global-Local Cross-Modality Alignment Attention (GL-CMA). CAST-S leverages an asymmetric design in one branch of the Siamese Transformer network by introducing a LayerNorm layer and incorporating modality embeddings to enhance the robustness of modality-invariant features. Meanwhile, GL-CMA facilitates the interaction between global and local features, significantly improving the representation of fine-grained features while effectively addressing spatial misalignment issues in cross-modality images. Experimental results demonstrate that the proposed method achieves state-of-the-art (SOTA) performance on the VX-ReID task, highlighting its effectiveness and potential in addressing the challenges of cross-modality package re-identification.
期刊介绍:
The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features