Open-Vocabulary Prohibited Item Detection for Real-World X-Ray Security Inspection

IF 8 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

IEEE Transactions on Information Forensics and Security Pub Date : 2025-07-23 DOI:10.1109/TIFS.2025.3586492

Shuyang Lin;Tong Jia;Hao Wang;Bowen Ma;Mingyuan Li

{"title":"Open-Vocabulary Prohibited Item Detection for Real-World X-Ray Security Inspection","authors":"Shuyang Lin;Tong Jia;Hao Wang;Bowen Ma;Mingyuan Li","doi":"10.1109/TIFS.2025.3586492","DOIUrl":null,"url":null,"abstract":"Computer-aided prohibited item detection is applied in X-ray security inspection to maintain public safety. However, existing prohibited item detectors are limited to a small set of categories in current X-ray datasets, posing potential risks to public security. Since constructing bigger datasets and annotating hundreds of categories is time-consuming and labor-intensive, scaling detectors to more categories with minimal supervision is of great importance. To this end, in this paper, we adopt an open-vocabulary object detection (OVOD) method to detect arbitrary unlabeled novel categories of prohibited item. OVOD methods typically rely on datasets with caption annotations, which are lacking in the domain of prohibited item detection. To support the research on OVOD in X-ray security inspection scenarios, we contribute PIXray Caption dataset, the first X-ray dataset with image-caption pair annotations, which could benchmark and facilitate researches in the community. Further, we propose a novel Open-Vocabulary Prohibited Item Detection (OVPID) network to leverage textual information from captions. OVPID contains two core modules, i.e., Interference Resistant Module (IRM) and Prediction Module (PM). Specifically, IRM includes two submodules, namely Edge Perception (EP) and Foreground Activation (FA), which are designed to address the dilemma of interference caused by overlapping problem and complex background in X-ray images. PM consists of two branches for classification and localization. In classification branch, PM generates more accurate prompts for X-ray dataset via large multimodal model (LMM). In localization branch, PM aligns the student embeddings with both teacher and caption embeddings. Extensive experiments on PIXray Caption dataset demonstrate that OVPID outperforms other OVOD methods by delivering a higher accuracy on novel categories.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"7469-7481"},"PeriodicalIF":8.0000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11095302/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Computer-aided prohibited item detection is applied in X-ray security inspection to maintain public safety. However, existing prohibited item detectors are limited to a small set of categories in current X-ray datasets, posing potential risks to public security. Since constructing bigger datasets and annotating hundreds of categories is time-consuming and labor-intensive, scaling detectors to more categories with minimal supervision is of great importance. To this end, in this paper, we adopt an open-vocabulary object detection (OVOD) method to detect arbitrary unlabeled novel categories of prohibited item. OVOD methods typically rely on datasets with caption annotations, which are lacking in the domain of prohibited item detection. To support the research on OVOD in X-ray security inspection scenarios, we contribute PIXray Caption dataset, the first X-ray dataset with image-caption pair annotations, which could benchmark and facilitate researches in the community. Further, we propose a novel Open-Vocabulary Prohibited Item Detection (OVPID) network to leverage textual information from captions. OVPID contains two core modules, i.e., Interference Resistant Module (IRM) and Prediction Module (PM). Specifically, IRM includes two submodules, namely Edge Perception (EP) and Foreground Activation (FA), which are designed to address the dilemma of interference caused by overlapping problem and complex background in X-ray images. PM consists of two branches for classification and localization. In classification branch, PM generates more accurate prompts for X-ray dataset via large multimodal model (LMM). In localization branch, PM aligns the student embeddings with both teacher and caption embeddings. Extensive experiments on PIXray Caption dataset demonstrate that OVPID outperforms other OVOD methods by delivering a higher accuracy on novel categories.

查看原文本刊更多论文

真实世界x射线安全检查的开放词汇违禁物品检测

计算机辅助违禁物品检测应用于x射线安检，以维护公共安全。然而，现有的违禁物品探测器仅限于目前x射线数据集中的一小部分类别，对公共安全构成潜在风险。由于构建更大的数据集和注释数百个类别既耗时又费力，因此在最少的监督下将检测器扩展到更多的类别是非常重要的。为此，本文采用开放词汇对象检测（open-vocabulary object detection， OVOD）方法检测任意未标注的违禁物品新类别。OVOD方法通常依赖于带有标题注释的数据集，这在禁用项检测领域是缺乏的。为了支持x射线安检场景下OVOD的研究，我们贡献了PIXray Caption数据集，这是第一个具有图像-标题对注释的x射线数据集，可以对社区的研究进行基准测试和促进。此外，我们提出了一种新的开放词汇禁止项目检测（OVPID）网络来利用字幕中的文本信息。OVPID包含两个核心模块，即抗干扰模块（Interference Resistant Module， IRM）和预测模块（Prediction Module， PM）。具体而言，IRM包括边缘感知（EP）和前景激活（FA）两个子模块，旨在解决x射线图像中重叠问题和复杂背景造成的干扰困境。PM由分类和定位两个分支组成。在分类分支中，PM通过大型多模态模型（large multimodal model， LMM）对x射线数据集生成更准确的提示。在本地化分支中，PM将学生嵌入与教师和标题嵌入对齐。在PIXray Caption数据集上的大量实验表明，OVPID在新类别上提供更高的准确率，优于其他OVOD方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Information Forensics and Security 工程技术-工程：电子与电气

CiteScore

14.40

自引率

7.40%

发文量

234

审稿时长

6.5 months

期刊介绍： The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features