{"title":"Simple background subtraction constraint for weakly supervised background subtraction network","authors":"T. Minematsu, Atsushi Shimada, R. Taniguchi","doi":"10.1109/AVSS.2019.8909896","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909896","url":null,"abstract":"Recently, background subtraction based on deep convolutional neural networks has demonstrated excellent performance in change detection tasks. However, most of the reported approaches require pixel-level label images for training the networks. To reduce the cost of rendering pixel-level annotation data, weakly supervised learning approaches using frame-level labels have been proposed. These labels indicate if a target class is present. Frame-level supervised learning is challenging because we cannot use location information for training the networks. Therefore, some constraints are introduced for guiding foreground locations. Previous works exploit prior information on foreground sizes and shapes. In this work, we propose two constraints for weakly supervised background subtraction networks. Our constraints use binary mask images generated by simple background subtraction. Unlike previous works, our approach does not require prior information on foreground sizes and shapes. Moreover, our constraints are more suitable for change detection tasks. We also present an experiment verifying that our constraints can improve foreground detection accuracy compared to other methods, which do not include them.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115484073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sheng-Luen Chung, Chih-Fang Chen, G. Hsu, Shen-Te Wu
{"title":"Identification of Partially Occluded Pharmaceutical Blister Packages","authors":"Sheng-Luen Chung, Chih-Fang Chen, G. Hsu, Shen-Te Wu","doi":"10.1109/AVSS.2019.8909890","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909890","url":null,"abstract":"Medical dispensing refers to the in-office preparation and delivery of prescription drugs, which is mostly dispensed by the units of blister packages. The objective of the study is to design an image-based blister package identification solution, which is capable of identifying a fetched drug based on a pair of the two opposite camera images of the hand-held drug. To this aim, this paper proposes a deep learning based Hand-held Blister Identification network (HBIN) to identify partially occluded blister packages present in arbitrary positions and orientation with possibly cluttered backgrounds. The proposed HBIN is a two-stage network that contains Blister cropping network (BCN) followed by RTT identification network (RIN). The BCN subnetwork, an image to image translation deep learning network, is to crop both side contours of the hand-held drug, before the pair of cropped contours can be juxtaposed as a fixed sized and fixed orientation RTT (rectified two-sides template) for final identification in the RIN sub-network. A blister package dataset containing a total of 30,394 images based on 230 types, typically found in hospital dispensing stations, have been collected and labeled. With extensive test, the accuracy of the primitive primitive HBIN attains an F-score of more than 94.33% for testing data from similar backgrounds and an F-score of 79.80% for dissimilar backgrounds. Although still a prototype, the preliminary results show the feasibility of identifying blister packages during retrieval process without resorting to bar codes nor RFID tags.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129301704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiwei Lu, Mahesh Kumar Krishna Reddy, Seyed shahabeddin Nabavi, Yang Wang
{"title":"Future Frame Prediction Using Convolutional VRNN for Anomaly Detection","authors":"Yiwei Lu, Mahesh Kumar Krishna Reddy, Seyed shahabeddin Nabavi, Yang Wang","doi":"10.1109/AVSS.2019.8909850","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909850","url":null,"abstract":"Anomaly detection in videos aims at reporting anything that does not conform the normal behaviour or distribution. However, due to the sparsity of abnormal video clips in real life, collecting annotated data for supervised learning is exceptionally cumbersome. Inspired by the practicability of generative models for semi-supervised learning, we propose a novel sequential generative model based on variational autoencoder (VAE) for future frame prediction with convolutional LSTM (ConvLSTM). To the best of our knowledge, this is the first work that considers temporal information in future frame prediction based anomaly detection framework from the model perspective. Our experiments demonstrate that our approach is superior to the state-of-the-art methods on three benchmark datasets.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129129547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VikingDet: A Real-time Person and Face Detector for Surveillance Cameras","authors":"Zhongxia Xiong, Ziying Yao, Yalong Ma, Xinkai Wu","doi":"10.1109/AVSS.2019.8909901","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909901","url":null,"abstract":"In this paper, we propose a novel one-stage detector that can simultaneously detect both pedestrians and their faces. The framework is named as VikingDet for its simple but effective two-headed architecture. To tackle the challenges of person and face detection especially under surveillance cameras (e.g. low data quality, complex environments, requirements for efficiency, etc.), we make contributions in the following several aspects: 1) integrating both person and face detection into one network which current leading object detection algorithms are seldomly able to handle; 2) emphasizing detection in low-quality images. we introduce multiple thresholds for matching different sized positive samples, and set proper hyper-parameters, hence our VikingDet is able to locate small objects in surveillance cameras even of low-quality; 3) introducing a training strategy to utilize datasets on hand. Since most available public datasets annotate only people without their faces or faces without bodies, we use multi-step training and an integrated loss function to train VikingDet with these partly annotated data. As a consequence, our detector achieves satisfactory performances in several relative benchmarks with a speed at more than 60 FPS on NVIDIA TITAN X GPU, and can be further deployed on an embedded device such as NVIDIA Jetson TX1 or TX2 with a real-time speed of over 28 FPS.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121393408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Violence Detection Using 3D Convolutional Neural Networks","authors":"Ji Li, Xinghao Jiang, Tanfeng Sun, Ke Xu","doi":"10.1109/AVSS.2019.8909883","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909883","url":null,"abstract":"Automatically analyzing violent content in surveillance videos is of profound significance on many applications, ranging from Internet video filtration to public security protection. In this paper, we propose a deep learning model based on 3D convolutional neural networks, without using hand-crafted features or RNN architectures exclusively for encoding temporal information. The improved internal designs adopt compact but effective bottleneck units for learning motion patterns and leverage the DenseNet architecture to promote feature reusing and channel interaction, which is proved to be more capable of capturing spatiotemporal features and requires relatively fewer parameters. The performance of the proposed model is validated on three standard datasets in terms of recognition accuracy compared to other advanced approaches. Meanwhile, supplementary experiments are carried out to evaluate its effectiveness and efficiency. The final results demonstrate the advantages of the proposed model over the state-of-the-art methods in both recognition accuracy and computational efficiency.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121477420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vasileios Magoulianitis, Dimitrios Ataloglou, A. Dimou, D. Zarpalas, P. Daras
{"title":"Does Deep Super-Resolution Enhance UAV Detection?","authors":"Vasileios Magoulianitis, Dimitrios Ataloglou, A. Dimou, D. Zarpalas, P. Daras","doi":"10.1109/AVSS.2019.8909865","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909865","url":null,"abstract":"The popularity of Unmanned Aerial Vehicles (UAVs) is increasing year by year and reportedly their applications hold great shares in global technology market. Yet, since UAVs can be also used for illegal actions, this raises various security issues that needs to be encountered. Towards this end, UAV detection systems have emerged to detect and further anticipate inimical drones. A very significant factor is the maximum detection range in which the system's senses can “see” an upcoming UAV. For those systems that employ optical cameras for detecting UAVs, the main issue is the accurate drone detection when it fades away into sky. This work proposes the incorporation of Super-Resolution (SR) techniques in the detection pipeline, to increase its recall capabilities. A deep SR model is utilized prior to the UAV detector to enlarge the image by a factor of 2. Both models are trained in an end-to-end manner to fully exploit the joint optimization effects. Extensive experiments demonstrate the validity of the proposed method, where potential gains in the detector's recall performance can reach up to 32.4%.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134135474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Claire Labit-Bonis, Jérôme Thomas, F. Lerasle, Francisco Madrigal
{"title":"Fast Tracking-by-Detection of Bus Passengers with Siamese CNNs","authors":"Claire Labit-Bonis, Jérôme Thomas, F. Lerasle, Francisco Madrigal","doi":"10.1109/AVSS.2019.8909843","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909843","url":null,"abstract":"We target the problem of providing 5G network connectivity in rural zones by means of Base Stations (BSs) carried by Unmanned Aerial Vehicles (UAVs). Our goal is to schedule the UAVs missions to: i) limit the amount of energy consumed by each UAV, ii) ensure the coverage of selected zones over the territory, ii) decide where and when each UAV has to be recharged in a ground site, iii) deal with the amount of energy provided by Solar Panels (SPs) and batteries installed in each ground site. We then formulate the RURALPLAN optimization problem, a variant of the unsplittable multicommodity flow problem defined on a multiperiod graph. After detailing the objective function and the constraints, we solve RURALPLAN in a realistic scenario. Results show that RURALPLAN is able to outperform a solution ensuring coverage but not considering the energy management of the UAVs.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128974701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gaussian Normalization: Handling Burstiness in Visual Data","authors":"Rémi Trichet, N. O’Connor","doi":"10.1109/AVSS.2019.8909857","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909857","url":null,"abstract":"This paper addresses histogram burstiness, defined as the tendency of histograms to feature peaks out of proportion with their general distribution. After highlighting the impact of this growing issue on computer vision problems and the need to preserve the distribution information, we introduce a new normalization based on a Gaussian fit with a pre-defined variance for each datum that suppresses burst without adversely affecting the distribution. Experimental results on four public datasets show that our normalization scheme provides a staggering performance boost compared to other normalizations, even allowing Gaussian-normalized Bag-of-Words to perform similarly to intra-normalized Fisher vectors.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128254510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Minagawa, K. Okahara, Kento Yamazaki, Tsukasa Fukasawa
{"title":"A Camera Recalibration Method for a Top-View Surveillance System based on Relative Camera Pose and Structural Similarity","authors":"Jun Minagawa, K. Okahara, Kento Yamazaki, Tsukasa Fukasawa","doi":"10.1109/AVSS.2019.8909870","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909870","url":null,"abstract":"In this paper, we present a camera recalibration method for a discontinuous top-view stitched image. A top-view stitched image using multiple camera images is efficient for surveillance camera monitoring because it enables monitors to understand surroundings easily by spatial continuity. However, the top-view stitched image loses the advantage even if only one camera used in the stitched image is shifted by physical contact or its own weight. A simple solution such as returning shifted cameras physically or calibrating cameras with calibration markers again takes time and personnel. To address this problem, we propose the recalibration method based on relative pose of camera and image structural similarity in the top-view image. The proposed method performs recalibration without changing camera pose physically or using calibration markers. As a result of an experiment, we found out that our method restores a discontinuous top-view image stitched closing to the original one.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"165 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121216807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Exploration on Temperature Term in Training Deep Neural Networks","authors":"Zhaofeng Si, H. Qi","doi":"10.1109/AVSS.2019.8909875","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909875","url":null,"abstract":"Model compression technique is now widely investigated to fit the high-complexity deep neural network into resource-constrained mobile devices in recent years, in which one of effective methods is knowledge distillation. In this paper we make a discussion on the temperature term introduced in knowledge distillation method. The temperature term in distill training is aimed at making it easier for the student network to learn the generalization capablityof teacher network by softening the labels from the teacher network. We analyze the situation of using the temperature term in ordinary training to soften the output of neural network instead of soften the target. In experiments, we show that by applying a proper temperature term in training process, a better performance can be gained on NABirds dataset than using the model without temperature term.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123265241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}