基于cnn的训练数据自动标注人群结构分析

2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) Pub Date : 2019-09-01 DOI:10.1109/AVSS.2019.8909846

M. S. Zitouni, A. Sluzek, H. Bhaskar

{"title":"基于cnn的训练数据自动标注人群结构分析","authors":"M. S. Zitouni, A. Sluzek, H. Bhaskar","doi":"10.1109/AVSS.2019.8909846","DOIUrl":null,"url":null,"abstract":"A CNN-based framework is presented for extracting and classifying from static images of crowd (acquired from surveillance systems) individuals, small groups and large groups. A novel approach to the network training has been investigated. Instead of manually outlined ground-truth data, we use automatic annotations by alternative baseline algorithms (which consider both motion and appearance). The proposed CNN detectors are initially trained over rather limited amounts of data. Nevertheless, the detectors are subsequently updated (fine-tuned) by using new batches of automatically annotated samples. Those test samples are periodically acquired by the baseline algorithms from the future surveillance data. Fine-tuning is performed when noticeable differences appear between results by the CNN-detectors and the results of baseline algorithms (which may indicate changes in visual conditions, scenarios or updates in the baseline algorithms). We preliminarily demonstrate that satisfactory performances of CNN-based detectors can be achieved, even if the baseline algorithms have limited accuracy. Actually, it was noticed that fine-tuned CNN-detectors can be superior to the baseline algorithms used for automatic annotation of training data (even though the baseline algorithms process both static images and video-sequences). Since only static images are used once the detectors are fully trained, the presented solution can simplify complexity of systems automatically evaluating structure and behavior of crowds.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"CNN-Based Analysis of Crowd Structure using Automatically Annotated Training Data\",\"authors\":\"M. S. Zitouni, A. Sluzek, H. Bhaskar\",\"doi\":\"10.1109/AVSS.2019.8909846\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A CNN-based framework is presented for extracting and classifying from static images of crowd (acquired from surveillance systems) individuals, small groups and large groups. A novel approach to the network training has been investigated. Instead of manually outlined ground-truth data, we use automatic annotations by alternative baseline algorithms (which consider both motion and appearance). The proposed CNN detectors are initially trained over rather limited amounts of data. Nevertheless, the detectors are subsequently updated (fine-tuned) by using new batches of automatically annotated samples. Those test samples are periodically acquired by the baseline algorithms from the future surveillance data. Fine-tuning is performed when noticeable differences appear between results by the CNN-detectors and the results of baseline algorithms (which may indicate changes in visual conditions, scenarios or updates in the baseline algorithms). We preliminarily demonstrate that satisfactory performances of CNN-based detectors can be achieved, even if the baseline algorithms have limited accuracy. Actually, it was noticed that fine-tuned CNN-detectors can be superior to the baseline algorithms used for automatic annotation of training data (even though the baseline algorithms process both static images and video-sequences). Since only static images are used once the detectors are fully trained, the presented solution can simplify complexity of systems automatically evaluating structure and behavior of crowds.\",\"PeriodicalId\":243194,\"journal\":{\"name\":\"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AVSS.2019.8909846\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AVSS.2019.8909846","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

提出了一种基于cnn的框架，用于从人群(从监控系统获取)的静态图像中提取和分类个体、小群体和大群体。研究了一种新的网络训练方法。我们没有手动勾勒出真实的数据，而是使用替代基线算法(同时考虑运动和外观)的自动注释。提出的CNN检测器最初是在相当有限的数据量上进行训练的。然而，检测器随后通过使用新批次的自动注释样本进行更新(微调)。这些测试样本由基线算法定期从未来的监测数据中获取。当cnn检测器的结果与基线算法的结果之间出现明显差异时(这可能表明视觉条件、场景或基线算法的更新发生了变化)，就会进行微调。我们初步证明，即使基线算法的精度有限，基于cnn的检测器也可以获得令人满意的性能。实际上，经过微调的cnn检测器可以优于用于自动标注训练数据的基线算法(即使基线算法同时处理静态图像和视频序列)。由于检测器完全训练后只使用静态图像，因此所提出的解决方案可以简化系统自动评估人群结构和行为的复杂性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CNN-Based Analysis of Crowd Structure using Automatically Annotated Training Data

A CNN-based framework is presented for extracting and classifying from static images of crowd (acquired from surveillance systems) individuals, small groups and large groups. A novel approach to the network training has been investigated. Instead of manually outlined ground-truth data, we use automatic annotations by alternative baseline algorithms (which consider both motion and appearance). The proposed CNN detectors are initially trained over rather limited amounts of data. Nevertheless, the detectors are subsequently updated (fine-tuned) by using new batches of automatically annotated samples. Those test samples are periodically acquired by the baseline algorithms from the future surveillance data. Fine-tuning is performed when noticeable differences appear between results by the CNN-detectors and the results of baseline algorithms (which may indicate changes in visual conditions, scenarios or updates in the baseline algorithms). We preliminarily demonstrate that satisfactory performances of CNN-based detectors can be achieved, even if the baseline algorithms have limited accuracy. Actually, it was noticed that fine-tuned CNN-detectors can be superior to the baseline algorithms used for automatic annotation of training data (even though the baseline algorithms process both static images and video-sequences). Since only static images are used once the detectors are fully trained, the presented solution can simplify complexity of systems automatically evaluating structure and behavior of crowds.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)

自引率

0.00%

发文量