{"title":"Attribute-Based Facial Image Manipulation on Latent Space","authors":"Chien-Hung Lin, Yiyun Pan, Ja-Ling Wu","doi":"10.1109/AVSS52988.2021.9663845","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663845","url":null,"abstract":"Using machine learning to generate images has become more mature, especially the images produced using a Generative Adversarial Network. Unfortunately, the complicated architecture of those models makes it difficult for us to ensure the output images’ diversity and controllability without introducing little embarrassment in implementation. Therefore, some researchers try to edit the latent codes generated by a given learning model directly on the latent space for manipulating the output image by simply inputting the new latent codes into the original model without changing the model’s structure and learned parameters. However, the methods mentioned above faced the problems that the size of latent space cannot be too large or the trouble-some of features entanglement. In this work, we propose an approach to conquer the problems mentioned above, which is to compress the original latent space to better the applicability and usability of the methods limited by the size of the latent space. Compared with the existing methods, this method can be applied to more models and still reach the target of image manipulation.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124927304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bridging the Invisible and Visible World: Translation between RGB and IR Images through Contour Cycle GAN","authors":"Yawen Lu, G. Lu","doi":"10.1109/AVSS52988.2021.9663750","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663750","url":null,"abstract":"Infrared Radiation (IR) images that capture the emitted IR signals from surrounding environment have been widely applied to pedestrian detection and video surveillance. However, there are not many textures that appeared on thermal images as compared to RGB images, which brings enormous challenges and difficulties in various tasks. Visible images cannot capture scenes in the dark and night environment due to the lack of light. In this paper, we propose a Contour GAN-based framework to learn the cross-domain representation and also map IR images with visible images. In contrast to existing structures of image translation that focus on spectral consistency, our framework also introduces strong spatial constraints, with further spectral enhancement by illuminance contrast and consistency constraints. Designating our method for IR and RGB image translation, it can generate high-quality translated images. Extensive experiments on near IR (NIR) and far IR (thermal) datasets demonstrate superior performance for quantitative and visual results.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129859257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Coluccia, A. Fascista, Arne Schumann, L. Sommer, A. Dimou, D. Zarpalas, F. C. Akyon, Ogulcan Eryuksel, Kamil Anil Ozfuttu, S. Altinuc, Fardad Dadboud, Vaibhav Patel, Varun Mehta, M. Bolic, I. Mantegh
{"title":"Drone-vs-Bird Detection Challenge at IEEE AVSS2021","authors":"A. Coluccia, A. Fascista, Arne Schumann, L. Sommer, A. Dimou, D. Zarpalas, F. C. Akyon, Ogulcan Eryuksel, Kamil Anil Ozfuttu, S. Altinuc, Fardad Dadboud, Vaibhav Patel, Varun Mehta, M. Bolic, I. Mantegh","doi":"10.1109/AVSS52988.2021.9663844","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663844","url":null,"abstract":"This paper presents the 4-th edition of the “drone-vs-bird” detection challenge, launched in conjunction with the the 17-th IEEE International Conference on Advanced Video and Signal-based Surveillance (AVSS). The objective of the challenge is to tackle the problem of detecting the presence of one or more drones in video scenes where birds may suddenly appear, taking into account some important effects such as the background and foreground motion. The proposed solutions should identify and localize drones in the scene only when they are actually present, without being confused by the presence of birds and the dynamic nature of the captured scenes. The paper illustrates the results of the challenge on the 2021 dataset, which has been further extended compared to the previous edition run in 2020.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127954465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"[AVSS 2021 Front cover]","authors":"","doi":"10.1109/avss52988.2021.9663795","DOIUrl":"https://doi.org/10.1109/avss52988.2021.9663795","url":null,"abstract":"","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129251010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Performance of Crowd-Specific Detectors in Multi-Pedestrian Tracking","authors":"Daniel Stadler, J. Beyerer","doi":"10.1109/AVSS52988.2021.9663836","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663836","url":null,"abstract":"In recent years, several methods and datasets have been proposed to push the performance of pedestrian detection in crowded scenarios. In this study, three crowd-specific detectors are combined with a general tracking-by-detection approach to evaluate their applicability in multi-pedestrian tracking. Investigating the relation between detection and tracking accuracy, we make the interesting observation that in spite of a high detection capability, the performance in tracking can be poor and analyze the reasons behind that. However, one of the examined approaches can significantly boost the tracking performance on two benchmarks under different training configurations. It is shown that combining crowd-specific detectors with a simple tracking pipeline can achieve promising results, especially in challenging scenes with heavy occlusion. Although our tracker only relies on motion cues and no visual information is considered, applying the strong detections from the crowd-specific model, state-of-the-art results on the challenging MOT17 and MOT20 benchmarks are obtained.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"394 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115310435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miniar Ben Gamra, M. Akhloufi, Chunpeng Wang, Shuo Liu
{"title":"Deep Learning for Body Parts Detection using HRNet and EfficientNet","authors":"Miniar Ben Gamra, M. Akhloufi, Chunpeng Wang, Shuo Liu","doi":"10.1109/AVSS52988.2021.9663785","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663785","url":null,"abstract":"Human body parts detection is an important field of research in computer vision. It can serve as an essential tool in surveillance systems and used to automatically detect and moderate non-appropriate online content such as nudity, child pornography, violence, etc. In this work, we introduce a novel two-step framework to define ten body parts using joints localization. A new architecture with EfficientNet as a backbone is proposed and compared to HRNet for the first step of pose estimation. The resulting joints are then used as an input to the second step, where a set of rules is applied to connect the appropriate joints and to define each body part. The developed algorithms were tested using MPII human pose benchmark. The proposed approach achieved a very interesting performance with a 90.13% Probability of Correct Keypoint (PCK) for the pose estimation and an average of 89.80% of mean Average Precision (mAP) for the body parts detection.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116824354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Seismic Sensor based Human Activity Recognition Framework using Deep Learning","authors":"Priyanka Choudhary, Neeraj Goel, Mukesh Saini","doi":"10.1109/AVSS52988.2021.9663747","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663747","url":null,"abstract":"Activity recognition has gained attention due to the rapid development of microelectromechanical sensors. Numerous human-centric applications in healthcare, security, and smart environments can benefit from an efficient human activity recognition system. In this paper, we demonstrate the use of a seismic sensor for human activity recognition. Traditionally, researchers have relied on handcrafted features to identify the target activity, but these features may be inefficient in complex and noisy environments. The proposed framework employs an autoencoder to map the activity into a compact representative descriptor. Further, an Artificial Neural Network (ANN) classifier is trained on the extracted descriptors. We compare the proposed framework with multiple machine learning classifiers and a state-of-the-art framework on different evaluation metrics. On 5-fold cross-validation, the proposed approach outperforms the state-of-the-art in terms of precision and recall by an average of 10.68 and 23.36%, respectively. We also collected a dataset to assess the efficacy of the proposed seismic sensor-based activity recognition. The dataset is collected in a variety of challenging environments, such as variable grass length, soil moisture content, and the passing of unwanted vehicles nearby.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123889405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introspective Closed-Loop Perception for Energy-efficient Sensors","authors":"Kruttidipta Samal, M. Wolf, S. Mukhopadhyay","doi":"10.1109/AVSS52988.2021.9663801","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663801","url":null,"abstract":"Task-driven closed-loop perception-sensing systems have shown considerable energy savings over traditional open-loop systems. Prior works on such systems have used simple feedback signals such as object detections and tracking which led to poor perception quality. This paper proposes an improved approach based on perceptual risk. First, a method is proposed to estimate the risk of failure to detect a target of interest. The risk estimate is used as a signal in a feedback system to determine how sensor resources are utilized. Two feedback algorithms are proposed: one based on proportional/integral methods and the other based on 0/1 (bang-bang) methods. These feedback algorithms are compared based on the efficiency with which they use available sensor resources as well as their absolute detection rates. Experiments on two real-world autonomous driving datasets show that the proposed system has better object detection recall and lower marginal cost of prediction than prior work.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116459629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luis Patino, Jonathan N. Boyle, J. Ferryman, Jonas Auer, Julian Pegoraro, R. Pflugfelder, Mertcan Cokbas, J. Konrad, P. Ishwar, Giulia Slavic, L. Marcenaro, Yifan Jiang, Youngsaeng Jin, Hanseok Ko, Guangliang Zhao, Guy Ben-Yosef, Jianwei Qiu
{"title":"PETS2021: Through-foliage detection and tracking challenge and evaluation","authors":"Luis Patino, Jonathan N. Boyle, J. Ferryman, Jonas Auer, Julian Pegoraro, R. Pflugfelder, Mertcan Cokbas, J. Konrad, P. Ishwar, Giulia Slavic, L. Marcenaro, Yifan Jiang, Youngsaeng Jin, Hanseok Ko, Guangliang Zhao, Guy Ben-Yosef, Jianwei Qiu","doi":"10.1109/AVSS52988.2021.9663837","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663837","url":null,"abstract":"This paper presents the outcomes of the PETS2021 challenge held in conjunction with AVSS2021 and sponsored by the EU FOLDOUT project. The challenge comprises the publication of a novel video surveillance dataset on through-foliage detection, the defined challenges addressing person detection and tracking in fragmented occlusion scenarios, and quantitative and qualitative performance evaluation of challenge results submitted by six worldwide participants. The results show that while several detection and tracking methods achieve overall good results, through-foliage detection and tracking remains a challenging task for surveillance systems especially as it serves as the input to behaviour (threat) recognition.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130080733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Fire Detection Model Based on Tiny-YOLOv3 with Hyperparameters Improvement","authors":"Zeineb Daoud, A. B. Hamida, C. Amar","doi":"10.1109/AVSS52988.2021.9663822","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663822","url":null,"abstract":"Fires are the most devastating disasters that the world can face. Thereby, it is crucial to exactly identify fire areas in video surveillance scenes, to overcome the shortcomings of the existing fire detection methods. Recently, deep learning models have been widely used for fire recognition applications. Indeed, a novel deep fire detection method is introduced in this paper. An improved fire model based on tiny-YOLOv3 (You Only Look Once version 3) network is developed in order to enhance the detection accuracy. The main idea is the tiny-YOLOv3 improvement according to the refined proposed training hyperparameters. The generated model is trained and evaluated on the constructed and manually labeled dataset. Results show that applying the proposed training heuristics with the tiny-YOLOv3 network improves the fire detection performance with 81.65% of mean Average Precision (mAP). Also, the designed model outperforms the related works with a detection precision of 97.6%.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116134207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}