基于变压器的拥挤情况下头部特征不完全行人检测

IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Zefei Chen;Yongjie Lin;Jianmin Xu;Kai Lu;Yanfang Shou
{"title":"基于变压器的拥挤情况下头部特征不完全行人检测","authors":"Zefei Chen;Yongjie Lin;Jianmin Xu;Kai Lu;Yanfang Shou","doi":"10.1109/LSP.2024.3525397","DOIUrl":null,"url":null,"abstract":"Pedestrian detection in crowded situation is a challenging task. This study presents a straightforward and effective method called Det RCNN to detect pedestrians in crowded situation, while also pairing the body and head of individual pedestrian. On the one hand, pedestrians' heads have their characteristics of stable shape and distinct feature. On the other hand, their heads are usually positioned higher in image, so even in crowded situation, it is difficult to completely cover the pedestrians' heads. Therefore, this study equipped the DETR model with a Head Decoder (HDecoder) parallel to the Decoder. HDecoder takes the head knowledge generated in the Decoder phase as head queries. Simultaneously, the HDecoder uses a key-query mechanism to search the entire image for the body bounding boxes corresponding to the head queries. Lastly, the proposed method conducts a straightforward IOU (Intersection over Union) matching between the body bounding boxes produced in the Decoder and HDecoder phases. This HDecoder resembles the second stage of the Faster RCNN model, hence this paper termed it Det RCNN (DETR RCNN). Compared to Deformable DETR, the experimental results on the CrowdHuman dataset show that the proposed model can increase AP<inline-formula><tex-math>$_{m}$</tex-math></inline-formula> from 53.02 to 53.87. Furthermore, the mMR<inline-formula><tex-math>$^{-2}$</tex-math></inline-formula> decreased from 52.46 to 42.32 compared to the existing BFJ.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"576-580"},"PeriodicalIF":3.2000,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Detecting Pedestrian With Incomplete Head Feature in Crowded Situation Based on Transformer\",\"authors\":\"Zefei Chen;Yongjie Lin;Jianmin Xu;Kai Lu;Yanfang Shou\",\"doi\":\"10.1109/LSP.2024.3525397\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pedestrian detection in crowded situation is a challenging task. This study presents a straightforward and effective method called Det RCNN to detect pedestrians in crowded situation, while also pairing the body and head of individual pedestrian. On the one hand, pedestrians' heads have their characteristics of stable shape and distinct feature. On the other hand, their heads are usually positioned higher in image, so even in crowded situation, it is difficult to completely cover the pedestrians' heads. Therefore, this study equipped the DETR model with a Head Decoder (HDecoder) parallel to the Decoder. HDecoder takes the head knowledge generated in the Decoder phase as head queries. Simultaneously, the HDecoder uses a key-query mechanism to search the entire image for the body bounding boxes corresponding to the head queries. Lastly, the proposed method conducts a straightforward IOU (Intersection over Union) matching between the body bounding boxes produced in the Decoder and HDecoder phases. This HDecoder resembles the second stage of the Faster RCNN model, hence this paper termed it Det RCNN (DETR RCNN). Compared to Deformable DETR, the experimental results on the CrowdHuman dataset show that the proposed model can increase AP<inline-formula><tex-math>$_{m}$</tex-math></inline-formula> from 53.02 to 53.87. Furthermore, the mMR<inline-formula><tex-math>$^{-2}$</tex-math></inline-formula> decreased from 52.46 to 42.32 compared to the existing BFJ.\",\"PeriodicalId\":13154,\"journal\":{\"name\":\"IEEE Signal Processing Letters\",\"volume\":\"32 \",\"pages\":\"576-580\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-01-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Signal Processing Letters\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10820533/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10820533/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

拥挤环境下的行人检测是一项具有挑战性的任务。本研究提出了一种简单有效的方法,称为Det RCNN,用于在拥挤情况下检测行人,同时还将行人个体的身体和头部进行配对。一方面,行人头部具有形状稳定、特征鲜明的特点。另一方面,他们的头部通常在图像中的位置较高,因此即使在拥挤的情况下,也很难完全覆盖行人的头部。因此,本研究为DETR模型配备了一个与解码器并行的头部解码器(HDecoder)。HDecoder将在Decoder阶段生成的头部知识作为头部查询。同时,HDecoder使用键查询机制在整个图像中搜索与头部查询相对应的body边界框。最后,提出的方法在Decoder和HDecoder阶段产生的体边界框之间进行直接的IOU (Intersection over Union)匹配。这种HDecoder类似于Faster RCNN模型的第二阶段,因此本文将其称为Det RCNN (DETR RCNN)。与Deformable DETR相比,在CrowdHuman数据集上的实验结果表明,该模型可以将AP$_{m}$从53.02提高到53.87。mMR$^{-2}$从52.46下降到42.32。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Detecting Pedestrian With Incomplete Head Feature in Crowded Situation Based on Transformer
Pedestrian detection in crowded situation is a challenging task. This study presents a straightforward and effective method called Det RCNN to detect pedestrians in crowded situation, while also pairing the body and head of individual pedestrian. On the one hand, pedestrians' heads have their characteristics of stable shape and distinct feature. On the other hand, their heads are usually positioned higher in image, so even in crowded situation, it is difficult to completely cover the pedestrians' heads. Therefore, this study equipped the DETR model with a Head Decoder (HDecoder) parallel to the Decoder. HDecoder takes the head knowledge generated in the Decoder phase as head queries. Simultaneously, the HDecoder uses a key-query mechanism to search the entire image for the body bounding boxes corresponding to the head queries. Lastly, the proposed method conducts a straightforward IOU (Intersection over Union) matching between the body bounding boxes produced in the Decoder and HDecoder phases. This HDecoder resembles the second stage of the Faster RCNN model, hence this paper termed it Det RCNN (DETR RCNN). Compared to Deformable DETR, the experimental results on the CrowdHuman dataset show that the proposed model can increase AP$_{m}$ from 53.02 to 53.87. Furthermore, the mMR$^{-2}$ decreased from 52.46 to 42.32 compared to the existing BFJ.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Signal Processing Letters
IEEE Signal Processing Letters 工程技术-工程:电子与电气
CiteScore
7.40
自引率
12.80%
发文量
339
审稿时长
2.8 months
期刊介绍: The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信