{"title":"基于cnn的训练数据自动标注人群结构分析","authors":"M. S. Zitouni, A. Sluzek, H. Bhaskar","doi":"10.1109/AVSS.2019.8909846","DOIUrl":null,"url":null,"abstract":"A CNN-based framework is presented for extracting and classifying from static images of crowd (acquired from surveillance systems) individuals, small groups and large groups. A novel approach to the network training has been investigated. Instead of manually outlined ground-truth data, we use automatic annotations by alternative baseline algorithms (which consider both motion and appearance). The proposed CNN detectors are initially trained over rather limited amounts of data. Nevertheless, the detectors are subsequently updated (fine-tuned) by using new batches of automatically annotated samples. Those test samples are periodically acquired by the baseline algorithms from the future surveillance data. Fine-tuning is performed when noticeable differences appear between results by the CNN-detectors and the results of baseline algorithms (which may indicate changes in visual conditions, scenarios or updates in the baseline algorithms). We preliminarily demonstrate that satisfactory performances of CNN-based detectors can be achieved, even if the baseline algorithms have limited accuracy. Actually, it was noticed that fine-tuned CNN-detectors can be superior to the baseline algorithms used for automatic annotation of training data (even though the baseline algorithms process both static images and video-sequences). Since only static images are used once the detectors are fully trained, the presented solution can simplify complexity of systems automatically evaluating structure and behavior of crowds.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"CNN-Based Analysis of Crowd Structure using Automatically Annotated Training Data\",\"authors\":\"M. S. Zitouni, A. Sluzek, H. Bhaskar\",\"doi\":\"10.1109/AVSS.2019.8909846\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A CNN-based framework is presented for extracting and classifying from static images of crowd (acquired from surveillance systems) individuals, small groups and large groups. A novel approach to the network training has been investigated. Instead of manually outlined ground-truth data, we use automatic annotations by alternative baseline algorithms (which consider both motion and appearance). The proposed CNN detectors are initially trained over rather limited amounts of data. Nevertheless, the detectors are subsequently updated (fine-tuned) by using new batches of automatically annotated samples. Those test samples are periodically acquired by the baseline algorithms from the future surveillance data. Fine-tuning is performed when noticeable differences appear between results by the CNN-detectors and the results of baseline algorithms (which may indicate changes in visual conditions, scenarios or updates in the baseline algorithms). We preliminarily demonstrate that satisfactory performances of CNN-based detectors can be achieved, even if the baseline algorithms have limited accuracy. Actually, it was noticed that fine-tuned CNN-detectors can be superior to the baseline algorithms used for automatic annotation of training data (even though the baseline algorithms process both static images and video-sequences). Since only static images are used once the detectors are fully trained, the presented solution can simplify complexity of systems automatically evaluating structure and behavior of crowds.\",\"PeriodicalId\":243194,\"journal\":{\"name\":\"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AVSS.2019.8909846\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AVSS.2019.8909846","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
CNN-Based Analysis of Crowd Structure using Automatically Annotated Training Data
A CNN-based framework is presented for extracting and classifying from static images of crowd (acquired from surveillance systems) individuals, small groups and large groups. A novel approach to the network training has been investigated. Instead of manually outlined ground-truth data, we use automatic annotations by alternative baseline algorithms (which consider both motion and appearance). The proposed CNN detectors are initially trained over rather limited amounts of data. Nevertheless, the detectors are subsequently updated (fine-tuned) by using new batches of automatically annotated samples. Those test samples are periodically acquired by the baseline algorithms from the future surveillance data. Fine-tuning is performed when noticeable differences appear between results by the CNN-detectors and the results of baseline algorithms (which may indicate changes in visual conditions, scenarios or updates in the baseline algorithms). We preliminarily demonstrate that satisfactory performances of CNN-based detectors can be achieved, even if the baseline algorithms have limited accuracy. Actually, it was noticed that fine-tuned CNN-detectors can be superior to the baseline algorithms used for automatic annotation of training data (even though the baseline algorithms process both static images and video-sequences). Since only static images are used once the detectors are fully trained, the presented solution can simplify complexity of systems automatically evaluating structure and behavior of crowds.