Mounir Bendali-Braham, J. Weber, G. Forestier, L. Idoumghar, Pierre-Alain Muller
{"title":"Ensemble classification of video-recorded crowd movements","authors":"Mounir Bendali-Braham, J. Weber, G. Forestier, L. Idoumghar, Pierre-Alain Muller","doi":"10.1109/ISPA52656.2021.9552129","DOIUrl":"https://doi.org/10.1109/ISPA52656.2021.9552129","url":null,"abstract":"Ensemble learning methods often improve results in problems addressed by single Machine Learning models. In this work, we apply Ensemble Learning on video-recorded crowd movements. First, we build Ensembles of homogeneous Convolutional Neural Networks (CNN) to compare their performance on the Crowd-11 dataset and show the gain of performance demonstrated by Ensembles compared to single CNN models. Secondly, we evaluate all the possible combinations of these homogeneous Ensembles to build a global Ensemble of heterogeneous models, and we analyze the combination of Ensembles that achieves the best results. Our experiments reveal that Ensemble classification often obtains better results than single models and combining different Ensembles can make the predictions accuracy even better.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121672107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification of Infant Behavioural Traits using Acoustic Cry: An Empirical Study","authors":"S. Jindal, K. Nathwani, V. Abrol","doi":"10.1109/ISPA52656.2021.9552159","DOIUrl":"https://doi.org/10.1109/ISPA52656.2021.9552159","url":null,"abstract":"The reason behind an infant's cry has been elusive to sometimes even the most skilled and experienced paediatricians. Our comprehensive research aims to classify infant's cry into their behavioural traits using objective and analytical machine learning approaches. Towards this goal, we compare conventional machine learning and more recent deep learning-based models for baby cry classification, using acoustic features, spectrograms, and a combination of the two. We performed a detailed empirical study on the publicly available donateacry-corpus and the CRIED dataset to highlight the effectiveness of appropriate acoustic features, signal processing, or machine learning techniques for this task. We also conclude that acoustic features and spectrograms together bring better results. As a side result, this work also emphasized the challenge of an inadequate baby cry database in modelling infant behavioural traits.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129173174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mesafint Fanuel, Xiaohong Yuan, Hyung Nam Kim, L. Qingge, K. Roy
{"title":"A Survey on Skeleton-Based Activity Recognition using Graph Convolutional Networks (GCN)","authors":"Mesafint Fanuel, Xiaohong Yuan, Hyung Nam Kim, L. Qingge, K. Roy","doi":"10.1109/ISPA52656.2021.9552064","DOIUrl":"https://doi.org/10.1109/ISPA52656.2021.9552064","url":null,"abstract":"Skeleton-Based Activity recognition is an active research topic in Computer Vision. In recent years, deep learning methods have been used in this area, including Recurrent Neural Network (RNN)-based, Convolutional Neural Network (CNN)-based and Graph Convolutional Network (GCN)-based approaches. This paper provides a survey of recent work on various Graph Convolutional Network (GCN)-based approaches being applied to Skeleton-Based Activity Recognition. We first introduce the conventional implementation of a GCN. Then methods that address the limitations of conventional GCN's are presented.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123538642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extra-low-dose 2D PET imaging","authors":"Anja Koščević, D. Petrinović","doi":"10.1109/ISPA52656.2021.9552059","DOIUrl":"https://doi.org/10.1109/ISPA52656.2021.9552059","url":null,"abstract":"In this paper, a new approach for the 2D PET data acquisition is introduced, which uses the intersections of lines of response (LORs) for the generation of a larger number of virtual LORs in the cases when the number of coincident events is initially small, i.e, when the amount of injected radiotracer is low. This approach is based on the fact that the statistical properties of the unknown 2D process are preserved in the statistical properties of intersections of LORs. The 2D image is reconstructed from virtual LORs using the well-known Filtered back-projection method, thereby achieving high temporal resolution with a reduced dose of radiotracer injected into the living organisms. Moreover, the larger number of virtual LORs yields the reconstructed 2D image of higher spatial resolution compared with the reconstruction from original LORs.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132040967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speech Emotion Recognition using GhostVLAD and Sentiment Metric Learning","authors":"B. Mocanu, Ruxandra Tapu","doi":"10.1109/ISPA52656.2021.9552068","DOIUrl":"https://doi.org/10.1109/ISPA52656.2021.9552068","url":null,"abstract":"In this paper, we introduce a novel deep learning-based speech emotion recognition method. The proposed approach exploits a convolutional neural network (CNN), enriched with a GhostVLAD feature aggregation layer. The resulting representation adjusts the contribution of each spectrogram segments to the final class prototype representation and is used for trainable and discriminative clustering purposes. In addition, we introduce a modified triplet loss function which integrates the relations between the various emotional patterns. The experimental evaluation, carried out on RAVDESS and CREMA-D datasets validates the proposed methodology, which yields emotion recognition rates superior to 83% and 64%, respectively. The comparative evaluation shows that the proposed approach outperforms state of the art techniques, with gains in accuracy of more than 3%.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126891314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Minimax Algorithm for Multi-channel Active Noise Control System","authors":"M. Jain, Arun Kumar, R. Bahl","doi":"10.1109/ISPA52656.2021.9552150","DOIUrl":"https://doi.org/10.1109/ISPA52656.2021.9552150","url":null,"abstract":"Global active noise control (ANC) employs multichannel filtered-x least mean square (MCFxLMS) algorithm as it is more suitable algorithm to obtain large quiet zone. Minimax algorithm was proposed to counter the higher computational complexity faced in MCFxLMS based ANC by minimizing the square of the maximum of the absolute values of residual noise at the error microphones. However, the minimax approach leads to inferior performance in terms of convergence as well as noise reduction. Also, the classical minimax approach offers little flexibility in adjusting the ANC performance. In this paper, a novel minimax algorithm is proposed in order to tackle these shortcomings of conventional minimax algorithm at a cost of increase in computational complexity as compared to conventional minimax algorithm. The performance of the proposed approach is evaluated and compared with classical minimax for global noise reduction in a 2-dimensional quiet zone of size 1 m x 1 m in a 3-dimensional reverberant room. The proposed scheme is able to improve the performance with much reduced computational complexity as compared to MCFxLMS though with increased computational complexity as compared to classical minimax approach.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129195395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ilija Domislović, Donik Vršnak, M. Subašić, S. Lončarić
{"title":"Outdoor daytime multi - illuminant color constancy","authors":"Ilija Domislović, Donik Vršnak, M. Subašić, S. Lončarić","doi":"10.1109/ISPA52656.2021.9552092","DOIUrl":"https://doi.org/10.1109/ISPA52656.2021.9552092","url":null,"abstract":"White-balancing is an important part of the image processing pipeline and is used in many computer vision applications. It removes the chromatic influence of the illumination on objects in the scene. White balancing is important in tasks such as object detection and object tracking. This problem is tackled in a myriad of ways, but most methods use the assumption that images contain only one dominant uniform illuminant. In recent years, neural networks have been used to create state-of-the-art methods for single illuminant white-balancing, but the problem of multi-illuminant white-balancing has been largely ignored. The main reason for this is the lack of multi-illuminant datasets. In this paper, we introduce a convolutional neural network for multi-illuminant (sun and shadow) illumination estimation. For the training and testing of the created model over 100 outdoor daytime images were taken using the Canon EOS 550D camera. We show that the model outperforms existing statistics-based methods on the test data.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134369872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating Patterns on the Triangular Grid by Cellular Automata including Alternating Use of Two Rules","authors":"M. Saadat, B. Nagy","doi":"10.1109/ISPA52656.2021.9552107","DOIUrl":"https://doi.org/10.1109/ISPA52656.2021.9552107","url":null,"abstract":"Various patterns and figures are used widely in image processing including tests of various algorithms, compressing images and also creating/displaying them in computer games. In this paper binary image generation is studied on the triangular grid. On the one hand, the triangular grid has better symmetric properties than the square grid, while on the other hand, the number of closest neighbors of a pixel is less than it is on the square grid. In this way, cellular automata based on the closest neighbors are simpler than similar automata on the square grid, but the generated pictures may be more sophisticated. In our binary cellular automata the state (color) of the closest three neighbors and the pixel's own state determine the next state; we use life-like deterministic cellular automata. In our novel approach we combine two different automata (rules) such that we use them alternately for the picture generation. Various patterns, including highly symmetric mandala type patterns, as well as, airplanes, trees etc. are shown as examples. Some general ideas and hints are also given.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114869480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Medak, L. Posilović, M. Subašić, T. Petković, M. Budimir, S. Lončarić
{"title":"Rapid Defect Detection by Merging Ultrasound B-scans from Different Scanning Angles","authors":"D. Medak, L. Posilović, M. Subašić, T. Petković, M. Budimir, S. Lončarić","doi":"10.1109/ISPA52656.2021.9552050","DOIUrl":"https://doi.org/10.1109/ISPA52656.2021.9552050","url":null,"abstract":"Ultrasonic testing (UT) is a commonly used approach for inspection of material and defect detection without causing harm to the inspected component. To improve the reliability of defect detection, the material is often scanned from various angles leading to an immense amount of data that needs to be analyzed. Some of the defects are only seen on B-scans taken from a particular angle so discarding some of the data would increase the risk of not detecting all of the defects. Recently there has been significant progress in the development of methods for automated defect analysis from the UT data. Using such methods the inspection can be performed quicker, but it is still necessary to inspect all of the angles to detect defects. In this work, we test a novel approach for accelerating the analysis by merging the images from various angles. To reduce the information loss during the process of merging, we develop a new model with a weighting module that dynamically determines the importance of each of the scanning angles. Using the proposed module, the loss of information is minimal, so the precision of the detection model is comparable to the model tested on each of the images separately. Using the merged images input, the analysis can be accelerated by almost 15 times.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122519985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Real-Time Implementation of a 3D Binaural System based on HRIRs Interpolation","authors":"V. Bruschi, Stefano Nobili, S. Cecchi","doi":"10.1109/ISPA52656.2021.9552125","DOIUrl":"https://doi.org/10.1109/ISPA52656.2021.9552125","url":null,"abstract":"Binaural synthesis is a very important aspect in the field of immersive audio and it requires the knowledge of the head related impulse responses (HRIRs). This paper describes the real-time implementation of an impulse responses interpolation method that allows to obtain an accurate binaural reproduction reducing measurement sets. The method is based on the time decomposition and frequency division of the HRIRs and the application of a peak detection and matching procedure in combination with an alignment algorithm and a linear interpolation. A 3D set-up has been considered and the algorithm has been evaluated by means of objective and subjective tests, comparing it with the state of the art. The obtained results have demonstrated the excellent performance of the proposed system.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130752127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}