{"title":"Image-Based Real-Time Fire Detection using Deep Learning with Data Augmentation for Vision-Based Surveillance Applications","authors":"Li-Wei Kang, I-Shan Wang, Ke-Lin Chou, Shih-Yu Chen, Chuan-Yu Chang","doi":"10.1109/AVSS.2019.8909899","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909899","url":null,"abstract":"With recent advances in embedded processing capability, vision-based real-time fire detection has been enabled in surveillance devices. This paper presents an image-based fire detection framework based on deep learning. The key is to learn a fire detector relying on tiny-YOLO (You Only Look Once) v3 deep model. With the advantage of lightweight architecture of tiny-YOLOv3 and training data augmentation by some parameter adjusting, our fire detection model can achieve better detection accuracy in real-time with lower complexity in the training stage. Experimental results have verified the effectiveness of the proposed framework.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123980191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving ResNet-based Feature Extractor for Face Recognition via Re-ranking and Approximate Nearest Neighbor","authors":"Sheng-Hsing Hsiao, J. Jang","doi":"10.1109/AVSS.2019.8909884","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909884","url":null,"abstract":"This paper proposes a framework for face recognition based on feature extractor from ResNet, together with other steps for performance improvement, including face detection, face alignment, face verification/identification, and re-ranking via Approximate Nearest Neighbor Search (ANNS). First, we evaluate two face detection algorithms, MTCNN, and FaceBoxes on three common face detection benchmarks, and then summarize the best usage scenario for each approach. Second, with certain preprocessing and postprocessing, our system selects the ResNet-based feature extractor, which achieves 99.33% verification accuracy on the LFW benchmark. Third, we use the penalty curve to determine the best configuration and obtain improved results of face verification. Based on the proposed preprocessing and post-processing, our method not only boosts accuracy from 84.3% to 86.5% in large inter-class variation datasets (CASIA - WebFace) but improves Rank-l accuracy from 86.6% to 87.7% in large intra-class variation datasets (FG-NET).","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128921772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sheng-Luen Chung, Yuchen Xue, S. Chien, Ruei-Shan Chan
{"title":"Improved Part-aligned Deep Features Learning for Person Re-Identification","authors":"Sheng-Luen Chung, Yuchen Xue, S. Chien, Ruei-Shan Chan","doi":"10.1109/AVSS.2019.8909867","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909867","url":null,"abstract":"Person Re-IDentification (Re-ID) is to recognize a person who has been seen before by different cameras from possibly scenes. Re-ID poses as one of the most difficult computer vision problems owing to the enormous amount of identities involved in a large-scale image pool, with much similar appearance constrained by low resolution image, in a possibly occluded scene, etc. Global features geared for general object recognition and face recognition are far less adequate to re-identify a same person across cameras. As such, more discriminating features are needed to identify people. In particular, part-based feature extraction methods that extract by learning local fine- grained features of different human body parts from detected persons have been proved effective for person Re-ID. To further improve the part-aligned spatial feature approach, this paper proposes an improved part-aligned feature (IPAF) deep learning framework to better characterize a person's complete information with the following threes highlights: part alignment, finer part segmentation, and better learning network backbone. Our proposed solution has been trained and tested on the two most comprehensive Re-ID datasets with comparable performance of reported state-of-the-art solutions: for the dataset of Market1501 (DukeMTMC-reID), our proposed solution both achieves competitive results with mAP of 85.96% (84.70%) and CMC 1 of 94.30% (89.84%), respectively.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116987049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nehemia Sugianto, D. Tjondronegoro, G. Sorwar, Prithwi Raj Chakraborty, E. Yuwono
{"title":"Continuous Learning without Forgetting for Person Re-Identification","authors":"Nehemia Sugianto, D. Tjondronegoro, G. Sorwar, Prithwi Raj Chakraborty, E. Yuwono","doi":"10.1109/AVSS.2019.8909828","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909828","url":null,"abstract":"Deep learning-based person re-identification faces a scalability challenge when the target domain requires continuous learning. Service environments, such as airports, need to recognize new visitors and add new cameras over time. Training-at-once is not enough to make the model robust to new tasks and domain variations. A well-known approach is fine-tuning, which suffers forgetting problem on old tasks when learning new tasks. Joint-training can alleviate the problem but requires old datasets, which is unobtainable in some cases. Recently, Learning without forgetting (LwF) shows its ability to mitigate the problem without old datasets. This paper extends the benefit of LwF from image classification to person re-identification with further challenges. Comprehensive experiments are based on Market1501 and DukeMTMC4ReID to evaluate and benchmark LwF to other approaches. The results confirm that LwF outperforms fine-tuning in preserving old knowledge and joint-training in faster training.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114871550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuheng Lin, Hua Yang, Xianchao Tang, Tianqi Shi, Lin Chen
{"title":"Social MIL: Interaction-Aware for Crowd Anomaly Detection","authors":"Shuheng Lin, Hua Yang, Xianchao Tang, Tianqi Shi, Lin Chen","doi":"10.1109/AVSS.2019.8909882","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909882","url":null,"abstract":"Crowd anomaly detection under surveillance scene is a quite challenging task, which often companies with not rare objects, unexpected bursts in activity and complex dynamic patterns. In this paper, we propose a social multiple-instance learning(MIL) framework with a dual-branch network by considering dynamic interaction among groups, individuals and environment to obtain attentive spatial-temporal feature representation. First, MIL is employed to overcome the challenge of rare training abnormal samples and video-based labels. The social force map is utilized for modeling behavior interaction to supply the prior knowledge. In addition, we introduce the self-attention module, which represents a more discriminative spatial-temporal feature based on C3D network through implementing weight redistribution inside the feature. The results of the experiments conducted on UCF-Crime dataset show that the proposed dual-branch social multiple-instance learning (MIL) anomaly detection framework with the dual-branch network outperforms than existing approaches and obtains the state-of-the-art performance.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121514117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Chaudhary, Akshay Dudhane, Prashant W. Patil, S. Murala
{"title":"Pose Guided Dynamic Image Network for Human Action Recognition in Person Centric Videos","authors":"S. Chaudhary, Akshay Dudhane, Prashant W. Patil, S. Murala","doi":"10.1109/AVSS.2019.8909835","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909835","url":null,"abstract":"The most emerging concerns in computer vision are size of data to process and privacy preserving of the end user. Camera sensors are all around us these days, recording and analysing our day-to-day activities. In this scenario the privacy perseverance becomes a question of concern especially in case of devices working on the basis of human action recognition (HAR). Another important concern in computer vision is the size of data. The surveillance requires continues transfer of huge amount of data through the network. The processing time required to transfer the video to central server and analyses the video directly depends on the resolution of the video. The research in computer vision is exploring the possibility of working on different aspects of videos such as using only pose information or representing whole video using a single frame for the purpose of HAR. Here, an attempt is made to explore the concept of pose estimation and video representation using dynamic image to solve the dual purpose of privacy preserving and decreasing the load on network for transfer of videos over the network for analysis. In this paper, a new Pose Guided Dynamic Image (PDI) network is proposed for HAR which is capable of providing a summarized single frame for the person's activity in any given video. Unlike dynamic image network, this approach considers only the person's motion and discards the background motion. Therefore, PDI provides more specific information required for HAR as compared to the dynamic image. Also, by summarizing the video, the identity of the person remains masked. The proposed method is able to provide better result on both of the benchmark datasets used namely JHMDB and UCF-sports for the experimentation.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122199926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mark Schutera, Frank M. Hafner, Hendrik Vogt, Jochen Abhau, M. Reischl
{"title":"Domain is of the Essence: Data Deployment for City-Scale Multi-Camera Vehicle Re-Identification","authors":"Mark Schutera, Frank M. Hafner, Hendrik Vogt, Jochen Abhau, M. Reischl","doi":"10.1109/AVSS.2019.8909858","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909858","url":null,"abstract":"In deep learning applications large annotated datasets are considered necessary for application development and improved model performance. This work aims to investigate the validity of this assumption when enlarging a given dataset, by secondary data, with a certain domain discrepancy. The paradigm for this evaluation is a vehicle reidentification system for city-scale multi-camera settings. In city-scale multi-camera settings, the field of view of the sensors are fixed, introducing a major domain discrepancy between different datasets. This work shows that the domain of training samples heavily influences the learned feature space embedding and thus leads to a domain-specific performance. We explore how different objective functions and transfer learning approaches cope with a domain discrepancy in the training data. Concluding, the general assumption “Data is of the essence” has to be refined. With respect to feature space embeddings, our findings propose, beyond data “Domain is of the essence”.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131387192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph-to-Graph Energy Minimization for Video Object Segmentation","authors":"Yuezun Li, Longyin Wen, Ming-Ching Chang, Siwei Lyu","doi":"10.1109/AVSS.2019.8909894","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909894","url":null,"abstract":"We describe a new unsupervised video object segmentation (VOS) method based on the graph-to-graph energy minimization, which focuses on exploiting the mutual bootstrapping information between bottom-up (i.e., using pixel/superpixel attributes) and top-down (i.e., using learned appearance and motion cues) processes in a uni-fiedframework. Specifically, we construct a graph-to-graph energy function to encode the spatial similarities among superpixels (superpixel-graph) and temporal consistency among regions (region-graph). An efficient heuristic iterative algorithm is used to minimize the energy function to get the optimal assignment of superpixel and region labels to complete the VOS task. Experiments on two challenging benchmarks (i.e., SegTrack v2 and DAVIS) show that the proposed method achieves favorable performance against the state-of-the-art unsupervised VOS methods and comparable performance with the state-of-the-art semi-supervised methods.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133937677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mrunalini Nalamati, Ankit Kapoor, M. Saqib, N. Sharma, M. Blumenstein
{"title":"Drone Detection in Long-Range Surveillance Videos","authors":"Mrunalini Nalamati, Ankit Kapoor, M. Saqib, N. Sharma, M. Blumenstein","doi":"10.1109/AVSS.2019.8909830","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909830","url":null,"abstract":"The usage of small drones/UAVs has significantly increased recently. Consequently, there is a rising potential of small drones being misused for illegal activities such as terrorism, smuggling of drugs, etc. posing high-security risks. Hence, tracking and surveillance of drones are essential to prevent security breaches. The similarity in the appearance of small drone and birds in complex background makes it challenging to detect drones in surveillance videos. This paper addresses the challenge of detecting small drones in surveillance videos using popular and advanced deep learning-based object detection methods. Different CNN-based architectures such as ResNet-101 and Inception with Faster-RCNN, as well as Single Shot Detector (SSD) model was used for experiments. Due to sparse data available for experiments, pre-trained models were used while training the CNNs using transfer learning. Best results were obtained from experiments using Faster-RCNN with the base architecture of ResNet-101. Experimental analysis on different CNN architectures is presented in the paper, along with the visual analysis of the test dataset.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132907346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Person Re-Identification by Combining Siamese Convolutional Neural Network and Re-Ranking Process","authors":"Nabila Mansouri, Sourour Ammar, Yousri Kessentini","doi":"10.1109/AVSS.2019.8909902","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909902","url":null,"abstract":"Person re-identification (re-ID) is an active task with several challenges such as variations of poses, view points, lighting and occlusion. When considering person re-ID as an image retrieval process, measuring the appearance similarity of a pairwise person images is the essential phase. Re-ranking process can improve its accuracy especially when it is based on an other similarity metric. In this paper, we propose a pipeline composed of two methods: A Siamese Convolutional Neural Network (S-CNN) and a k-reciprocal nearest neighbors (k-RNN) re-ranking algorithm. While most existing re-ranking methods ignore the importance of original distance in re-ranking, we jointly combine the S-CNN similarity measure and Jaccard distance to revise the initial ranked list. An experimental study is conducted on two benchmark person re-ID datasets (Market-1501 and Duke-MTMC-reID). The obtained results confirm the effectiveness of our method. A mAP improvement of 11.6% and 15.68% is obtained respectively for the two testing datasets.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133091741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}