Zakaria Alouani, Youssef Hmamouche, Btissam El Khamlichi, A. E. Seghrouchni
{"title":"A Spatio-temporal Deep Learning Approach for Underwater Acoustic Signals Classification","authors":"Zakaria Alouani, Youssef Hmamouche, Btissam El Khamlichi, A. E. Seghrouchni","doi":"10.1109/AVSS56176.2022.9959247","DOIUrl":"https://doi.org/10.1109/AVSS56176.2022.9959247","url":null,"abstract":"Target recognition from underwater acoustic signals is a major challenge in surveillance systems, especially in military and defense fields. Deep learning models are increasingly used for the automatic classification of underwater signals, but many challenges remain due to the complexity of sound navigation and ranging networks, the noise present in the signals, and the difficulty of collecting large amounts of data for efficient training. In this paper, we propose two new architectures for underwater signal classification based on Spatio-temporal modeling. In experiments, evaluations on two real datasets show that the proposed approach achieves a classification accuracy of 98% which outperforms the state-of-the-art methods. In addition, the proposed end-to-end network is considerably faster than MFCC-based networks such as Yamnet and VGGish.","PeriodicalId":408581,"journal":{"name":"2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125242584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MASK-MORPH: Does Morphing of Custom 3D Face Masks Threatens the Face Recognition Systems?","authors":"Raghavendra Ramachandra, S. Marcel","doi":"10.1109/AVSS56176.2022.9959348","DOIUrl":"https://doi.org/10.1109/AVSS56176.2022.9959348","url":null,"abstract":"Face Recognition Systems (FRS) are vulnerable to morphing attacks that are targeted towards highly secured applications, including Automatic Border Control (ABC) gates. In this paper, we investigate a 3D-face custom silicone mask as the source for generating face morphing attacks for the first time. We present a systematic study to benchmark the attack potential of mask morphing (digital) attacks on both commercial and academic FRS. To this extent, a new dataset is constructed using eight custom 3D silicone face masks and corresponding bona fide face images captured using three different smartphones. The mask morphing is carried out using a landmark-based method, and the newly constructed dataset comprises 635 bona fide, 1034 face masks and 613 mask morphing face images. Extensive experiments are carried out to benchmark the attack potential and detection of mask morphing attacks on FRS.","PeriodicalId":408581,"journal":{"name":"2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"5 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131488579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning and Computer Vision Techniques for Estimating Snow Coverage on Roads using Surveillance Cameras","authors":"François-Guillaume Landry, M. Akhloufi","doi":"10.1109/AVSS56176.2022.9959452","DOIUrl":"https://doi.org/10.1109/AVSS56176.2022.9959452","url":null,"abstract":"Road surface monitoring in winter conditions is of great importance to ensure the safety of road users. Estimation of snow coverage on roads can be included in intelligent transportation systems to alert drivers or improve snow removal processes. Several models have been proposed for estimating snow coverage using surveillance cameras, but these models have focused on predicting few snow levels, which limits their usefulness in practice. In this paper, we present a model that allows a more granular estimation of the percentage of road surface covered by snow by predicting snow coverage from 0% (no snow) to 100% (fully snow-covered) using increments of 10%. We propose an ensemble learning model combining a deep convolutional neural network (CNN) and a support-vector machine (SVM). The accuracy of our model is similar to the state-of-the-art accuracy despite the higher task complexity associated with the increased granularity of predictions.","PeriodicalId":408581,"journal":{"name":"2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"92 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113992153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dmitry Kozlov, Stanislav Pavlov, Alexander Zuev, M. Bakulin, Mariya Krylova, Igor Kharchikov
{"title":"Dual-valued Neural Networks","authors":"Dmitry Kozlov, Stanislav Pavlov, Alexander Zuev, M. Bakulin, Mariya Krylova, Igor Kharchikov","doi":"10.1109/AVSS56176.2022.9959227","DOIUrl":"https://doi.org/10.1109/AVSS56176.2022.9959227","url":null,"abstract":"The majority of existing neural networks operate with real-valued representation of data. However, there are multiple tasks in which the input is complex-valued. The complex-valued data is considered to be more informative in terms of larger representational capacity. These reasons motivate researchers to develop neural networks using complex numbers instead of real-valued ones. In this paper, we take a step forward in the generalization of neural networks. We develop the basic building blocks for dual-valued neural networks based on dual numbers. We adjust basic layers such as Linear, Convolution, Average Pooling, ReLU to the dual domain and present an algorithm for Dual Batch Normalization. We construct several dual-valued neural networks for classification tasks basing on classical CV problems and the MusicNet and G2Net datasets. We show that dual-valued models outperform analogous complex-valued neural networks in execution time and have higher or at least the same accuracy.","PeriodicalId":408581,"journal":{"name":"2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125139797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vision Transformers for Road Accident Detection from Dashboard Cameras","authors":"Feten Hajri, H. Fradi","doi":"10.1109/AVSS56176.2022.9959545","DOIUrl":"https://doi.org/10.1109/AVSS56176.2022.9959545","url":null,"abstract":"Road accidents are increasing at a worrying rate and have raised one of the major concerns in traffic road monitoring. Their detection is becoming a very important aspect for intelligent traffic management systems. Unlike most of the existing anomaly detection systems that mainly monitor traffic status from static cameras, we focus in this paper on more challenging scenario using dashboard cameras. To handle this problem, we propose to adopt vision transformers with positional embeddings and based on multi-head attention mechanism for traffic monitoring following the increasing development of such models in natural language processing and computer vision communities. Precisely, to accomplish accident identification while exploiting the spatio-temporal aspect of videos, we employ a mix architecture. This architecture has the advantage of incorporating convolutional layers to capture local correlations of different patterns within the same image and vision transformer to learn the sequential correlations between the extracted features. Extensive experiments on two popular datasets DAD and CCD have been conducted to demonstrate the effectiveness of the proposed approach in terms of detection accuracy. The obtained results are compared to some recurrent neural networks commonly used to process sequential input data such as CNN-RNN, Conv-LSTM, and LCRN.","PeriodicalId":408581,"journal":{"name":"2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129255129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved Separation of Closely-spaced Speakers by Exploiting Auxiliary Direction of Arrival Information within a U-Net Architecture","authors":"Stijn Kindt, Alexander Bohlender, N. Madhu","doi":"10.1109/AVSS56176.2022.9959632","DOIUrl":"https://doi.org/10.1109/AVSS56176.2022.9959632","url":null,"abstract":"Microphone arrays use spatial diversity for separating concurrent audio sources. Source signals from different directions of arrival (DOAs) are captured with DOA-dependent time-delays between the microphones. These can be exploited in the short-time Fourier transform domain to yield time-frequency masks that extract a target signal while suppressing unwanted components. Using deep neural networks (DNNs) for mask estimation has drastically improved separation performance. However, separation of closely spaced sources remains difficult due to their similar inter-microphone time delays. We propose using auxiliary information on source DOAs within the DNN to improve the separation. This can be encoded by the expected phase differences between the microphones. Alternatively, the DNN can learn a suitable input representation on its own when provided with a multi-hot encoding of the DOAs. Experimental results demonstrate the benefit of this information for separating closely spaced sources.","PeriodicalId":408581,"journal":{"name":"2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122367275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vijay M. Galshetwar, Ashutosh Kulkarni, S. Chaudhary
{"title":"Consolidated Adversarial Network for Video De-raining and De-hazing","authors":"Vijay M. Galshetwar, Ashutosh Kulkarni, S. Chaudhary","doi":"10.1109/AVSS56176.2022.9959454","DOIUrl":"https://doi.org/10.1109/AVSS56176.2022.9959454","url":null,"abstract":"The performance of recent video enhancement methods is superior in specific hazy, rainy, snowy, and foggy weather conditions. However, these approaches can handle degradation rendered by single weather. We propose an integrated lightweight adversarial learning network to handle the degradations induced by different weather conditions. This is a unique approach to mitigate the problem of video restoration for multi- weather degraded videos using single network. The proposed architecture combines the idea of multi-resolution analysis with a multi-scale encoder and domain-specific feature learning is achieved using domain-aware filtering modules. The architecture provides recurrent feature sharing for temporal consistency, achieved by feeding the previous frame output as feedback. Substantial experiments on various datasets demonstrate that the proposed method performs competitively with the existing state-of-the-art approaches for video restoration in multi-weather conditions.","PeriodicalId":408581,"journal":{"name":"2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132006909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial-Temporal Transformer for Crime Recognition in Surveillance Videos","authors":"Kayleigh Boekhoudt, Estefanía Talavera","doi":"10.1109/AVSS56176.2022.9959414","DOIUrl":"https://doi.org/10.1109/AVSS56176.2022.9959414","url":null,"abstract":"Human-related crime recognition from surveillance videos becomes an even more challenging task when dealing with relatively similar human actions. We propose a transformer-based model that relies on the spatial-temporal representation of extracted skeletal trajectories for fine-grained classification. We validate the effectiveness of our model on the complex HR-Crime dataset consisting of videos representing 13 categories of human-related crimes. Quantitative and qualitative results suggest that building a transformer architecture with coupled spatial and temporal modules enables the model to compete in performance while improving intrinsic interpretability.","PeriodicalId":408581,"journal":{"name":"2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129049821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FBMOT: Flow Bridges the Gap between Detection and Tracking in Multiple Object Tracking","authors":"Lisheng Wu, Liuan Wang, Jun Sun","doi":"10.1109/AVSS56176.2022.9959159","DOIUrl":"https://doi.org/10.1109/AVSS56176.2022.9959159","url":null,"abstract":"The detection performance is a key bottleneck to the performance of multiple object tracking (MOT) algorithms and the advanced detection algorithms contribute a large portion to the success of MOT algorithms. Even though, the detectors can still make false detections based on only the image frames, which may directly result in the loss or mismatch of tracks. In MOT tasks, we point out that the video frames are temporally correlated and the temporal relationship can be leveraged to further improve the performance of detection and subsequent tracking. In our work, we propose FBMOT which uses optical flow to compute a prior heatmap about the locations of previously tracked objects in the current frame and takes the heatmap as an additional input in detection. We further proposed a novel regularization loss to help our model distinguish useful information in the prior heatmap. As a result, our method improves 0.8, 1.0 and 2.7 MOTA in MOT16, MOT17 and MOT20 test datasets respectively.","PeriodicalId":408581,"journal":{"name":"2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"8 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114118590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph Structure Learning Boosted Neural Network for Image Segmentation","authors":"Jinde Liu, Zhang Zhang","doi":"10.1109/AVSS56176.2022.9959488","DOIUrl":"https://doi.org/10.1109/AVSS56176.2022.9959488","url":null,"abstract":"Although Convolutional Neural Networks have made significant progress in image segmentation, it remains inadequate for exploring the structural relationships between image components and how graphs can be employed to guide image segmentation. To explore the structural relationships inherent in image components, the Graph Structure Learning Boosted Neural Network was proposed, which takes the contextual information generated by the CNN as features of the nodes and then uses a self-supervised graph generator to generate an adjacency matrix representing the image components connectivity. Then a Graph Neural Network (GNN) uses the adjacency matrix to fuse information between components according to their connectivity, thus transforming the CNN’s pixel classification problem into the GNN’s pixel classification problem. The whole model is lightweight and scalable, and extensive experiments have demonstrated the scalability of the model alongside the effectiveness of the method.","PeriodicalId":408581,"journal":{"name":"2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115199557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}