{"title":"VikingDet:用于监控摄像头的实时人和人脸检测器","authors":"Zhongxia Xiong, Ziying Yao, Yalong Ma, Xinkai Wu","doi":"10.1109/AVSS.2019.8909901","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a novel one-stage detector that can simultaneously detect both pedestrians and their faces. The framework is named as VikingDet for its simple but effective two-headed architecture. To tackle the challenges of person and face detection especially under surveillance cameras (e.g. low data quality, complex environments, requirements for efficiency, etc.), we make contributions in the following several aspects: 1) integrating both person and face detection into one network which current leading object detection algorithms are seldomly able to handle; 2) emphasizing detection in low-quality images. we introduce multiple thresholds for matching different sized positive samples, and set proper hyper-parameters, hence our VikingDet is able to locate small objects in surveillance cameras even of low-quality; 3) introducing a training strategy to utilize datasets on hand. Since most available public datasets annotate only people without their faces or faces without bodies, we use multi-step training and an integrated loss function to train VikingDet with these partly annotated data. As a consequence, our detector achieves satisfactory performances in several relative benchmarks with a speed at more than 60 FPS on NVIDIA TITAN X GPU, and can be further deployed on an embedded device such as NVIDIA Jetson TX1 or TX2 with a real-time speed of over 28 FPS.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"VikingDet: A Real-time Person and Face Detector for Surveillance Cameras\",\"authors\":\"Zhongxia Xiong, Ziying Yao, Yalong Ma, Xinkai Wu\",\"doi\":\"10.1109/AVSS.2019.8909901\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a novel one-stage detector that can simultaneously detect both pedestrians and their faces. The framework is named as VikingDet for its simple but effective two-headed architecture. To tackle the challenges of person and face detection especially under surveillance cameras (e.g. low data quality, complex environments, requirements for efficiency, etc.), we make contributions in the following several aspects: 1) integrating both person and face detection into one network which current leading object detection algorithms are seldomly able to handle; 2) emphasizing detection in low-quality images. we introduce multiple thresholds for matching different sized positive samples, and set proper hyper-parameters, hence our VikingDet is able to locate small objects in surveillance cameras even of low-quality; 3) introducing a training strategy to utilize datasets on hand. Since most available public datasets annotate only people without their faces or faces without bodies, we use multi-step training and an integrated loss function to train VikingDet with these partly annotated data. As a consequence, our detector achieves satisfactory performances in several relative benchmarks with a speed at more than 60 FPS on NVIDIA TITAN X GPU, and can be further deployed on an embedded device such as NVIDIA Jetson TX1 or TX2 with a real-time speed of over 28 FPS.\",\"PeriodicalId\":243194,\"journal\":{\"name\":\"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AVSS.2019.8909901\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AVSS.2019.8909901","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
摘要
本文提出了一种能够同时检测行人和人脸的单级检测器。该框架因其简单而有效的双头架构而被命名为VikingDet。为了解决人脸检测特别是在监控摄像机下的挑战(如数据质量低、环境复杂、效率要求等),我们在以下几个方面做出了贡献:1)将人脸和人脸检测集成到一个网络中,这是目前领先的目标检测算法很少能够处理的;2)强调对低质量图像的检测。我们引入了多个阈值来匹配不同大小的阳性样本,并设置了适当的超参数,因此我们的VikingDet即使在低质量的监控摄像机中也能定位小物体;3)引入一种训练策略来利用手头的数据集。由于大多数可用的公共数据集只注释了没有面部或没有身体的人,因此我们使用多步训练和集成损失函数来使用这些部分注释的数据训练VikingDet。因此,我们的检测器在几个相对基准测试中取得了令人满意的性能,在NVIDIA TITAN X GPU上的速度超过60 FPS,并且可以进一步部署在嵌入式设备上,如NVIDIA Jetson TX1或TX2,实时速度超过28 FPS。
VikingDet: A Real-time Person and Face Detector for Surveillance Cameras
In this paper, we propose a novel one-stage detector that can simultaneously detect both pedestrians and their faces. The framework is named as VikingDet for its simple but effective two-headed architecture. To tackle the challenges of person and face detection especially under surveillance cameras (e.g. low data quality, complex environments, requirements for efficiency, etc.), we make contributions in the following several aspects: 1) integrating both person and face detection into one network which current leading object detection algorithms are seldomly able to handle; 2) emphasizing detection in low-quality images. we introduce multiple thresholds for matching different sized positive samples, and set proper hyper-parameters, hence our VikingDet is able to locate small objects in surveillance cameras even of low-quality; 3) introducing a training strategy to utilize datasets on hand. Since most available public datasets annotate only people without their faces or faces without bodies, we use multi-step training and an integrated loss function to train VikingDet with these partly annotated data. As a consequence, our detector achieves satisfactory performances in several relative benchmarks with a speed at more than 60 FPS on NVIDIA TITAN X GPU, and can be further deployed on an embedded device such as NVIDIA Jetson TX1 or TX2 with a real-time speed of over 28 FPS.