Claudio Cimarelli, Dario Cazzato, M. Olivares-Méndez, H. Voos
{"title":"A case study on the impact of masking moving objects on the camera pose regression with CNNs","authors":"Claudio Cimarelli, Dario Cazzato, M. Olivares-Méndez, H. Voos","doi":"10.1109/AVSS.2019.8909904","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909904","url":null,"abstract":"Robot self-localization is essential for operating autonomously in open environments. When cameras are the main source of information for retrieving the pose, numerous challenges are posed by the presence of dynamic objects, due to occlusion and continuous changes in the appearance. Recent research on global localization methods focused on using a single (or multiple) Convolutional Neural Network (CNN) to estimate the 6 Degrees of Freedom (6-DoF) pose directly from a monocular camera image. In contrast with the classical approaches using engineered feature detector, CNNs are usually more robust to environmental changes in light and to occlusions in outdoor scenarios. This paper contains an attempt to empirically demonstrate the ability of CNNs to ignore dynamic elements, such as pedestrians or cars, through learning. For this purpose, we pre-process a dataset for pose localization with an object segmentation network, masking potentially moving objects. Hence, we compare the pose regression CNN trained and/or tested on the set of masked images and the original one. Experimental results show that the performances of the two training approaches are similar, with a slight reduction of the error when hiding occluding objects from the views.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"491 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124429986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Component Spatiotemporal Attention and its Application to Object Detection in Surveillance Videos","authors":"Roman Palenychka, R. Abielmona, F. Rea, E. Petriu","doi":"10.1109/AVSS.2019.8909874","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909874","url":null,"abstract":"This paper describes multi-component spatiotemporal attention mechanisms in application to object detection in videos. The detection of objects of interest relies on the analysis of feature-point areas (FPAs), which correspond to the object-relevant focus-of-attention (FoA) points extracted by the proposed spatiotemporal mechanisms of attention focusing. The attention mechanisms give detection priority to object-relevant FPAs with spatial saliency, spatiotemporal coherence, and area temporal change including motion. The preliminary test results of the proposed attention focusing mechanisms for object detection and tracking have confirmed its advantage in terms of robustness over existing visual attention-based detectors with comparable run-times.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128590543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supervised People Counting Using An Overhead Fisheye Camera","authors":"Shengye Li, M. Tezcan, P. Ishwar, J. Konrad","doi":"10.1109/AVSS.2019.8909877","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909877","url":null,"abstract":"We propose two supervised methods for people counting using an overhead fisheye camera. As opposed to standard cameras, fisheye cameras offer a large field of view and, when mounted overhead, reduce occlusions. However, methods developed for standard cameras perform poorly on fisheye images since they do not account for the radial image geometry. Furthermore, no large-scale fisheye-image datasets with radially-aligned bounding box annotations are available for training. We adapt YOLOv3 trained on standard images for people counting in fisheye images. In one method, YOLOv3 is applied to 24 rotated, overlapping windows and the results are post-processed to produce a people count. In another method, YOLOv3 is applied to windows of interest extracted by background subtraction. For evaluation, we collected and annotated an indoor fisheye-image dataset that we make public. Experiments on this dataset show that our methods reduce the people counting MAE of two natural benchmarks by over 60%.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127395574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Constraining Relative Camera Pose Estimation with Pedestrian Detector-Based Correspondence Filters","authors":"Emanuel Aldea, T. Pollok, Chengchao Qu","doi":"10.1109/AVSS.2019.8909859","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909859","url":null,"abstract":"A prerequisite for using smart camera networks effectively is a precise extrinsic calibration of the camera sensors, either in a fixed coordinate system, or relatively to each other. For cameras with partly overlapping fields of view, the relative pose estimation may be directly performed on or assisted by the video content obtained during scene analysis. In typical conditions however (wide baseline, repetitive patterns, homogeneous appearance of pedestrians), the pose estimation is imprecise and very often is affected by large errors in weakly constrained areas of the field of view. In this work, we propose to rely on progressively stricter constraints on the feature association between the camera views, guided by a pedestrian detector and a re-identification algorithm respectively. The results show that the two strategies are effective in alleviating the ambiguity which is due to the similar appearance of pedestrians in such scenes, and in improving the relative pose estimation.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127363973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video-Based Person Re-Identification using Refined Attention Networks","authors":"Tanzila Rahman, Mrigank Rochan, Yang Wang","doi":"10.1109/AVSS.2019.8909869","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909869","url":null,"abstract":"We consider the problem of video-based person reidentification. The goal is to identify a person from videos captured under different cameras. In this paper, we propose an efficient attention based model for person re-identifying from videos. Our method generates an attention score for each frame based on frame-level features. The attention scores of all frames in a video are used to produce a weighted feature vector for the input video. This video-level feature vector is refined iteratively for re-identifying persons from videos. Unlike most existing deep learning methods that use global or spatial representation, our approach focuses on attention scores. Extensive experiments on three benchmark datasets demonstrate that our method achieves the state-of-the-art performance.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128978922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UHCTD: A Comprehensive Dataset for Camera Tampering Detection","authors":"Pranav Mantini, S. Shah","doi":"10.1109/AVSS.2019.8909856","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909856","url":null,"abstract":"An unauthorized or an accidental change in the view of a surveillance camera is called a tampering. Algorithms that detect tampering by analyzing the video are referred to as camera tampering detection algorithms. Most evaluations on camera tampering detection methods are presented based on individually collected datasets. One of the major challenges in the area of camera tamper detection is the absence of a public dataset with sufficient size and variations for an extensive performance evaluation. We propose a large scale synthetic dataset called University of Houston Camera Tampering Detection dataset (UHCTD) for development and testing of camera tampering detection methods. The dataset consists of a total 576 tampers with over 288 hours of video captured from two surveillance cameras. To establish an initial benchmark, we cast camera tampering detection as a classification problem. We train and evaluate three different deep architectures that have shown promise in scene classification, Alexnet, Resnet, and Densenet. Results are presented to show how the dataset can be used to train and classify images as normal, and tampered within and across cameras.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130495193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cycle-Spinning GAN for Raindrop Removal from Images","authors":"Ülkü Uzun, A. Temi̇zel","doi":"10.1109/AVSS.2019.8909824","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909824","url":null,"abstract":"Weather events such as rain, snow, and fog degrade the quality of images taken under these conditions. Enhancement of such images is critical for intelligent transport and outdoor surveillance systems. Generative Adversarial Networks (GAN) based methods have been shown to be promising for enhancing these images in recent years. In this study, we adapt the cycle-spinning technique to GAN for removal of raindrops. The experimental evaluation of the proposed method shows that the performance is improved in terms of reference-based metrics (SSIM and PSNR). In addition, the approach also results in higher object detection performance in terms of mean average precision (mAP) metric when applied before the detection process.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131490026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hassan Zaal, Hafsa Iqbal, Damian Campo, L. Marcenaro, C. Regazzoni
{"title":"Incremental Learning of Abnormalities in Autonomous Systems","authors":"Hassan Zaal, Hafsa Iqbal, Damian Campo, L. Marcenaro, C. Regazzoni","doi":"10.1109/AVSS.2019.8909827","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909827","url":null,"abstract":"In autonomous systems, self-awareness capabilities are useful to allow artificial agents to detect abnormal situations based on previous experiences. This paper presents a method that facilitates the incremental learning of new models by an agent. Available learned models can dynamically generate probabilistic predictions as well as evaluate their mismatch from current observations. Observed mismatches are grouped through an unsupervised learning strategy into different classes, each of them corresponding to a dynamic model in a given region of the state space. Such clusters define switching Dynamic Bayesian Networks (DBNs) employed for predicting future instances and detect anomalies. Inferences generated by several DBNs that use different sensorial data are compared quantitatively. For testing the proposed approach, it is considered the multi-sensorial data generated by a robot performing various tasks in a controlled environment and a real autonomous vehicle moving at a University Campus.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"40 1-8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116508368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Hybrid Facial Expression Recognition System Based on Recurrent Neural Network","authors":"Jing-Ming Guo, Po-Cheng Huang, Li-Ying Chang","doi":"10.1109/AVSS.2019.8909888","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909888","url":null,"abstract":"Facial expression recognition (FER) is an important and challenging problem for automatic inspection of surveillance videos. In recent years, with the progress of hardware and the evolution of deep learning technology, it is possible to change the way of tackling facial expression recognition. In this paper, we propose a sequence-based facial expression recognition framework for differentiating facial expression. The proposed framework is extended to a frame-to-sequence approach by exploiting temporal information with gated recurrent units. In addition, facial landmark points and facial action unit are also used as input features to train our network which can represent facial regions and its components effectively. Based on this, we build a robust facial expression system and is evaluated using two publicly available databases. The experimental results show that despite the uncontrolled factors in the videos, the proposed deep learning-based solution is consistent in achieving promising performance compared to that of the former schemes.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116522615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu-Sheng Chou, Chien-Yao Wang, Ming-Chiao Chen, Shou-de Lin, H. Liao
{"title":"Dynamic Gallery for Real-Time Multi-Target Multi-Camera Tracking","authors":"Yu-Sheng Chou, Chien-Yao Wang, Ming-Chiao Chen, Shou-de Lin, H. Liao","doi":"10.1109/AVSS.2019.8909837","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909837","url":null,"abstract":"For multi-target multi-camera recognition tasks, tracking of objects of interest is one of the essential yet challenging issues due to the fact that the task requires re-identifying identical targets across distinct views. Multi-target multi-camera tracking (MTMCT) applications span a wide range of variety (e.g. crowd behavior analysis, anomaly individual tracking and sport player tracking), so how to make the system perform real-time tracking becomes a crucial research issue. In this paper, we propose an online hierarchical algorithm for extreme clustering based MTMCT framework. The system can automatically create a dynamic gallery with real-time fashion by collecting appearance information of multi-object tracking in single-camera view. We evaluate the effectiveness and efficiency of our framework, and compare the state-of-the-art methods on MOT16 as well as DukeMTMC for single and multiple camera tracking. The high-frame-rate performance and promising tracking results confirm our system can be used in realworld applications.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132774777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}