基于时频分析的视频多尺度人群局部计数与目标检测研究

J. Sensors Pub Date : 2022-08-12 DOI:10.1155/2022/7247757

Guoyin Ren, Xiaoqi Lu, Yuhao Li

{"title":"基于时频分析的视频多尺度人群局部计数与目标检测研究","authors":"Guoyin Ren, Xiaoqi Lu, Yuhao Li","doi":"10.1155/2022/7247757","DOIUrl":null,"url":null,"abstract":"Objective. It has become a very difficult task for cameras to complete real-time crowd counting under congestion conditions. Methods. This paper proposes a DRC-ConvLSTM network, which combines a depth-aware model and depth-adaptive Gaussian kernel to extract the spatial-temporal features and depth-level matching of crowd depth space edge constraints in videos, and finally achieves satisfactory crowd density estimation results. The model is trained with weak supervision on a training set of point-labeled images. The design of the detector is to propose a deep adaptive perception network DRD-NET, which can better initialize the size and position of the head detection frame in the image with the help of density map and RGBD-adaptive perception network. Results. The results show that our method achieves the best performance in RGBD dense video crowd counting on five labeled sequence datasets; the MICC dataset, CrowdFlow dataset, FDST dataset, Mall dataset, and UCSD dataset were evaluated to verify its effectiveness. Conclusion. The experimental results show that the proposed DRD-NET model combined with DRC-ConvLSTM outperforms the existing video crowd counting ConvLSTM model, and the effectiveness of the parameters of each part of the model is further proved by ablation experiments.","PeriodicalId":14776,"journal":{"name":"J. Sensors","volume":"1 1","pages":"1-19"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Research on Local Counting and Object Detection of Multiscale Crowds in Video Based on Time-Frequency Analysis\",\"authors\":\"Guoyin Ren, Xiaoqi Lu, Yuhao Li\",\"doi\":\"10.1155/2022/7247757\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective. It has become a very difficult task for cameras to complete real-time crowd counting under congestion conditions. Methods. This paper proposes a DRC-ConvLSTM network, which combines a depth-aware model and depth-adaptive Gaussian kernel to extract the spatial-temporal features and depth-level matching of crowd depth space edge constraints in videos, and finally achieves satisfactory crowd density estimation results. The model is trained with weak supervision on a training set of point-labeled images. The design of the detector is to propose a deep adaptive perception network DRD-NET, which can better initialize the size and position of the head detection frame in the image with the help of density map and RGBD-adaptive perception network. Results. The results show that our method achieves the best performance in RGBD dense video crowd counting on five labeled sequence datasets; the MICC dataset, CrowdFlow dataset, FDST dataset, Mall dataset, and UCSD dataset were evaluated to verify its effectiveness. Conclusion. The experimental results show that the proposed DRD-NET model combined with DRC-ConvLSTM outperforms the existing video crowd counting ConvLSTM model, and the effectiveness of the parameters of each part of the model is further proved by ablation experiments.\",\"PeriodicalId\":14776,\"journal\":{\"name\":\"J. Sensors\",\"volume\":\"1 1\",\"pages\":\"1-19\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Sensors\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1155/2022/7247757\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Sensors","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2022/7247757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

目标。在交通拥挤的情况下，摄像机如何完成实时的人群统计已经成为一项非常困难的任务。方法。本文提出了一种DRC-ConvLSTM网络，该网络结合深度感知模型和深度自适应高斯核提取视频中人群深度空间边缘约束的时空特征和深度级匹配，最终获得了令人满意的人群密度估计结果。该模型在点标记图像的训练集上进行弱监督训练。检测器的设计是提出一种深度自适应感知网络DRD-NET，该网络借助密度图和rgbd -自适应感知网络可以更好地初始化图像中头部检测帧的大小和位置。结果。结果表明，该方法在5个标记序列数据集上实现了RGBD密集视频人群计数的最佳性能;对MICC数据集、CrowdFlow数据集、FDST数据集、Mall数据集和UCSD数据集进行了评估，以验证其有效性。结论。实验结果表明，本文提出的DRD-NET模型结合DRC-ConvLSTM优于现有的视频人群计数ConvLSTM模型，并通过烧蚀实验进一步验证了模型各部分参数的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Research on Local Counting and Object Detection of Multiscale Crowds in Video Based on Time-Frequency Analysis

Objective. It has become a very difficult task for cameras to complete real-time crowd counting under congestion conditions. Methods. This paper proposes a DRC-ConvLSTM network, which combines a depth-aware model and depth-adaptive Gaussian kernel to extract the spatial-temporal features and depth-level matching of crowd depth space edge constraints in videos, and finally achieves satisfactory crowd density estimation results. The model is trained with weak supervision on a training set of point-labeled images. The design of the detector is to propose a deep adaptive perception network DRD-NET, which can better initialize the size and position of the head detection frame in the image with the help of density map and RGBD-adaptive perception network. Results. The results show that our method achieves the best performance in RGBD dense video crowd counting on five labeled sequence datasets; the MICC dataset, CrowdFlow dataset, FDST dataset, Mall dataset, and UCSD dataset were evaluated to verify its effectiveness. Conclusion. The experimental results show that the proposed DRD-NET model combined with DRC-ConvLSTM outperforms the existing video crowd counting ConvLSTM model, and the effectiveness of the parameters of each part of the model is further proved by ablation experiments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

J. Sensors

自引率

0.00%

发文量