{"title":"A Helmet Detection Algorithm Based on Transformers with Deformable Attention Module","authors":"Songle Chen;Hongbo Sun;Yuxin Wu;Lei Shang;Xiukai Ruan","doi":"10.23919/cje.2023.00.346","DOIUrl":null,"url":null,"abstract":"Wearing a helmet is one of the effective measures to protect workers' safety. To address the challenges of severe occlusion, multi-scale, and small target issues in helmet detection, this paper proposes a helmet detection algorithm based on deformable attention transformers. The main contributions of this paper are as follows. A compact end-to-end network architecture for safety helmet detection based on transformers is proposed. It cancels the computationally intensive transformer encoder module in the existing detection transformer (DETR) and uses the transformer decoder module directly on the output of feature extraction for query decoding, which effectively improves the efficiency of helmet detection. A novel feature extraction network named Swin transformer with deformable attention module (DSwin transformer) is proposed. By sparse cross-window attention, it enhances the contextual awareness of multi-scale features extracted by Swin transformer, and keeps high computational efficiency simultaneously. The proposed method generates the query reference points and query embeddings based on the joint prediction probabilities, and selects an appropriate number of decoding feature maps and sparse sampling points for query decoding, which further enhance the inference capability and processing speed. On the benchmark safety-helmet-wearing-dataset (SHWD), the proposed method achieves the average detection accuracy mAP@0.5 of 95.4% with 133.35G floating-point operations per second (FLOPs) and 20 frames per second (FPS), the state-of-the-art method for safety helmet detection.","PeriodicalId":50701,"journal":{"name":"Chinese Journal of Electronics","volume":"34 1","pages":"229-241"},"PeriodicalIF":1.6000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10891976","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chinese Journal of Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10891976/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Wearing a helmet is one of the effective measures to protect workers' safety. To address the challenges of severe occlusion, multi-scale, and small target issues in helmet detection, this paper proposes a helmet detection algorithm based on deformable attention transformers. The main contributions of this paper are as follows. A compact end-to-end network architecture for safety helmet detection based on transformers is proposed. It cancels the computationally intensive transformer encoder module in the existing detection transformer (DETR) and uses the transformer decoder module directly on the output of feature extraction for query decoding, which effectively improves the efficiency of helmet detection. A novel feature extraction network named Swin transformer with deformable attention module (DSwin transformer) is proposed. By sparse cross-window attention, it enhances the contextual awareness of multi-scale features extracted by Swin transformer, and keeps high computational efficiency simultaneously. The proposed method generates the query reference points and query embeddings based on the joint prediction probabilities, and selects an appropriate number of decoding feature maps and sparse sampling points for query decoding, which further enhance the inference capability and processing speed. On the benchmark safety-helmet-wearing-dataset (SHWD), the proposed method achieves the average detection accuracy mAP@0.5 of 95.4% with 133.35G floating-point operations per second (FLOPs) and 20 frames per second (FPS), the state-of-the-art method for safety helmet detection.
期刊介绍:
CJE focuses on the emerging fields of electronics, publishing innovative and transformative research papers. Most of the papers published in CJE are from universities and research institutes, presenting their innovative research results. Both theoretical and practical contributions are encouraged, and original research papers reporting novel solutions to the hot topics in electronics are strongly recommended.