{"title":"Speech Emotion Recognition using GhostVLAD and Sentiment Metric Learning","authors":"B. Mocanu, Ruxandra Tapu","doi":"10.1109/ISPA52656.2021.9552068","DOIUrl":"https://doi.org/10.1109/ISPA52656.2021.9552068","url":null,"abstract":"In this paper, we introduce a novel deep learning-based speech emotion recognition method. The proposed approach exploits a convolutional neural network (CNN), enriched with a GhostVLAD feature aggregation layer. The resulting representation adjusts the contribution of each spectrogram segments to the final class prototype representation and is used for trainable and discriminative clustering purposes. In addition, we introduce a modified triplet loss function which integrates the relations between the various emotional patterns. The experimental evaluation, carried out on RAVDESS and CREMA-D datasets validates the proposed methodology, which yields emotion recognition rates superior to 83% and 64%, respectively. The comparative evaluation shows that the proposed approach outperforms state of the art techniques, with gains in accuracy of more than 3%.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126891314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Sparse TFD Reconstruction Approach Using the S-method and Local Entropies Information","authors":"Vedran Jurdana, I. Volaric, V. Sucic","doi":"10.1109/ISPA52656.2021.9552042","DOIUrl":"https://doi.org/10.1109/ISPA52656.2021.9552042","url":null,"abstract":"This paper aims to investigate the S-method (SM) as an alternative for the Wigner-Ville Distribution (WVD) when used as the starting point for a sparse time-frequency distribution (TFD) reconstruction of non-stationary signals. The motivation comes from the SM's ability of providing a high-resolution TFD with satisfactory cross- and inner-artefact suppression, which should lead to a reconstructed TFD performance improvement over the WVD. The comparison between the WVD and the SM has been conducted using several state-of-the-art algorithms optimized with the multi-objective meta-heuristic optimization method (by minimizing the mean squared error between the local number of components in the starting and reconstructed TFDs and the number of regions with continuously connected samples). The results are shown for single and multi-component noisy synthetic signals.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125019739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mounir Bendali-Braham, J. Weber, G. Forestier, L. Idoumghar, Pierre-Alain Muller
{"title":"Ensemble classification of video-recorded crowd movements","authors":"Mounir Bendali-Braham, J. Weber, G. Forestier, L. Idoumghar, Pierre-Alain Muller","doi":"10.1109/ISPA52656.2021.9552129","DOIUrl":"https://doi.org/10.1109/ISPA52656.2021.9552129","url":null,"abstract":"Ensemble learning methods often improve results in problems addressed by single Machine Learning models. In this work, we apply Ensemble Learning on video-recorded crowd movements. First, we build Ensembles of homogeneous Convolutional Neural Networks (CNN) to compare their performance on the Crowd-11 dataset and show the gain of performance demonstrated by Ensembles compared to single CNN models. Secondly, we evaluate all the possible combinations of these homogeneous Ensembles to build a global Ensemble of heterogeneous models, and we analyze the combination of Ensembles that achieves the best results. Our experiments reveal that Ensemble classification often obtains better results than single models and combining different Ensembles can make the predictions accuracy even better.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121672107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extra-low-dose 2D PET imaging","authors":"Anja Koščević, D. Petrinović","doi":"10.1109/ISPA52656.2021.9552059","DOIUrl":"https://doi.org/10.1109/ISPA52656.2021.9552059","url":null,"abstract":"In this paper, a new approach for the 2D PET data acquisition is introduced, which uses the intersections of lines of response (LORs) for the generation of a larger number of virtual LORs in the cases when the number of coincident events is initially small, i.e, when the amount of injected radiotracer is low. This approach is based on the fact that the statistical properties of the unknown 2D process are preserved in the statistical properties of intersections of LORs. The 2D image is reconstructed from virtual LORs using the well-known Filtered back-projection method, thereby achieving high temporal resolution with a reduced dose of radiotracer injected into the living organisms. Moreover, the larger number of virtual LORs yields the reconstructed 2D image of higher spatial resolution compared with the reconstruction from original LORs.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132040967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ilija Domislović, Donik Vršnak, M. Subašić, S. Lončarić
{"title":"Outdoor daytime multi - illuminant color constancy","authors":"Ilija Domislović, Donik Vršnak, M. Subašić, S. Lončarić","doi":"10.1109/ISPA52656.2021.9552092","DOIUrl":"https://doi.org/10.1109/ISPA52656.2021.9552092","url":null,"abstract":"White-balancing is an important part of the image processing pipeline and is used in many computer vision applications. It removes the chromatic influence of the illumination on objects in the scene. White balancing is important in tasks such as object detection and object tracking. This problem is tackled in a myriad of ways, but most methods use the assumption that images contain only one dominant uniform illuminant. In recent years, neural networks have been used to create state-of-the-art methods for single illuminant white-balancing, but the problem of multi-illuminant white-balancing has been largely ignored. The main reason for this is the lack of multi-illuminant datasets. In this paper, we introduce a convolutional neural network for multi-illuminant (sun and shadow) illumination estimation. For the training and testing of the created model over 100 outdoor daytime images were taken using the Canon EOS 550D camera. We show that the model outperforms existing statistics-based methods on the test data.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134369872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification of Infant Behavioural Traits using Acoustic Cry: An Empirical Study","authors":"S. Jindal, K. Nathwani, V. Abrol","doi":"10.1109/ISPA52656.2021.9552159","DOIUrl":"https://doi.org/10.1109/ISPA52656.2021.9552159","url":null,"abstract":"The reason behind an infant's cry has been elusive to sometimes even the most skilled and experienced paediatricians. Our comprehensive research aims to classify infant's cry into their behavioural traits using objective and analytical machine learning approaches. Towards this goal, we compare conventional machine learning and more recent deep learning-based models for baby cry classification, using acoustic features, spectrograms, and a combination of the two. We performed a detailed empirical study on the publicly available donateacry-corpus and the CRIED dataset to highlight the effectiveness of appropriate acoustic features, signal processing, or machine learning techniques for this task. We also conclude that acoustic features and spectrograms together bring better results. As a side result, this work also emphasized the challenge of an inadequate baby cry database in modelling infant behavioural traits.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129173174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Minimax Algorithm for Multi-channel Active Noise Control System","authors":"M. Jain, Arun Kumar, R. Bahl","doi":"10.1109/ISPA52656.2021.9552150","DOIUrl":"https://doi.org/10.1109/ISPA52656.2021.9552150","url":null,"abstract":"Global active noise control (ANC) employs multichannel filtered-x least mean square (MCFxLMS) algorithm as it is more suitable algorithm to obtain large quiet zone. Minimax algorithm was proposed to counter the higher computational complexity faced in MCFxLMS based ANC by minimizing the square of the maximum of the absolute values of residual noise at the error microphones. However, the minimax approach leads to inferior performance in terms of convergence as well as noise reduction. Also, the classical minimax approach offers little flexibility in adjusting the ANC performance. In this paper, a novel minimax algorithm is proposed in order to tackle these shortcomings of conventional minimax algorithm at a cost of increase in computational complexity as compared to conventional minimax algorithm. The performance of the proposed approach is evaluated and compared with classical minimax for global noise reduction in a 2-dimensional quiet zone of size 1 m x 1 m in a 3-dimensional reverberant room. The proposed scheme is able to improve the performance with much reduced computational complexity as compared to MCFxLMS though with increased computational complexity as compared to classical minimax approach.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129195395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ISPA 2021 12th International Symposium on Image and Signal Processing and Analysis","authors":"","doi":"10.1109/ispa52656.2021.9552166","DOIUrl":"https://doi.org/10.1109/ispa52656.2021.9552166","url":null,"abstract":"","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123249531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating Patterns on the Triangular Grid by Cellular Automata including Alternating Use of Two Rules","authors":"M. Saadat, B. Nagy","doi":"10.1109/ISPA52656.2021.9552107","DOIUrl":"https://doi.org/10.1109/ISPA52656.2021.9552107","url":null,"abstract":"Various patterns and figures are used widely in image processing including tests of various algorithms, compressing images and also creating/displaying them in computer games. In this paper binary image generation is studied on the triangular grid. On the one hand, the triangular grid has better symmetric properties than the square grid, while on the other hand, the number of closest neighbors of a pixel is less than it is on the square grid. In this way, cellular automata based on the closest neighbors are simpler than similar automata on the square grid, but the generated pictures may be more sophisticated. In our binary cellular automata the state (color) of the closest three neighbors and the pixel's own state determine the next state; we use life-like deterministic cellular automata. In our novel approach we combine two different automata (rules) such that we use them alternately for the picture generation. Various patterns, including highly symmetric mandala type patterns, as well as, airplanes, trees etc. are shown as examples. Some general ideas and hints are also given.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114869480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Medak, L. Posilović, M. Subašić, T. Petković, M. Budimir, S. Lončarić
{"title":"Rapid Defect Detection by Merging Ultrasound B-scans from Different Scanning Angles","authors":"D. Medak, L. Posilović, M. Subašić, T. Petković, M. Budimir, S. Lončarić","doi":"10.1109/ISPA52656.2021.9552050","DOIUrl":"https://doi.org/10.1109/ISPA52656.2021.9552050","url":null,"abstract":"Ultrasonic testing (UT) is a commonly used approach for inspection of material and defect detection without causing harm to the inspected component. To improve the reliability of defect detection, the material is often scanned from various angles leading to an immense amount of data that needs to be analyzed. Some of the defects are only seen on B-scans taken from a particular angle so discarding some of the data would increase the risk of not detecting all of the defects. Recently there has been significant progress in the development of methods for automated defect analysis from the UT data. Using such methods the inspection can be performed quicker, but it is still necessary to inspect all of the angles to detect defects. In this work, we test a novel approach for accelerating the analysis by merging the images from various angles. To reduce the information loss during the process of merging, we develop a new model with a weighting module that dynamically determines the importance of each of the scanning angles. Using the proposed module, the loss of information is minimal, so the precision of the detection model is comparable to the model tested on each of the images separately. Using the merged images input, the analysis can be accelerated by almost 15 times.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122519985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}