C. Kokke, M. Coutiño, L. Anitori, R. Heusdens, G. Leus
{"title":"Sensor Selection for Angle of Arrival Estimation Based on the Two-Target Cramér-Rao Bound","authors":"C. Kokke, M. Coutiño, L. Anitori, R. Heusdens, G. Leus","doi":"10.1109/ICASSP49357.2023.10094942","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10094942","url":null,"abstract":"Sensor selection is a useful method to help reduce data throughput, as well as computational, power, and hardware requirements, while still maintaining acceptable performance. Although minimizing the Cramér-Rao bound has been adopted previously for sparse sensing, it did not consider multiple targets and unknown source models. In this work, we propose to tackle the sensor selection problem for angle of arrival estimation using the worst-case Cramér-Rao bound of two uncorrelated sources. To do so, we cast the problem as a convex semi-definite program and retrieve the binary selection by randomized rounding. Through numerical examples related to a linear array, we illustrate the proposed method and show that it leads to the natural selection of elements at the edges plus the center of the linear array. This contrasts with the typical solutions obtained from minimizing the single-target Cramér-Rao bound.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123958254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hang-Rui Hu, Yan Song, Jian-Tao Zhang, Lirong Dai, I. Mcloughlin, Zhu Zhuo, Yujie Zhou, Yu-Hong Li, Hui Xue
{"title":"Stargan-vc Based Cross-Domain Data Augmentation for Speaker Verification","authors":"Hang-Rui Hu, Yan Song, Jian-Tao Zhang, Lirong Dai, I. Mcloughlin, Zhu Zhuo, Yujie Zhou, Yu-Hong Li, Hui Xue","doi":"10.1109/ICASSP49357.2023.10094698","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10094698","url":null,"abstract":"Automatic speaker verification (ASV) faces domain shift caused by the mismatch of intrinsic and extrinsic factors, such as recording device and speaking style, in real-world applications, which leads to severe performance degradation. Since single-speaker multi-condition (SSMC) data is difficult to collect in practice, existing domain adaptation methods are hard to ensure the feature consistency of the same class but different domains. To this end, we propose a cross-domain data generation method to obtain a domain-invariant ASV system. Inspired by voice conversion (VC) task, a StarGAN based generative model first learns cross-domain mappings from SSMC data, and then generates missing domain data for all speakers, thus increasing the intra-class diversity of the training set. Considering the difference between ASV and VC task, we renovate the corresponding training objectives and network structure to make the adaptation task-specific. Evaluations on achieve a relative performance improvement of about 5-8% over the baseline in terms of minDCF and EER, outperforming the CNSRC winner’s system of the equivalent scale.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124002604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rapolas Daugintis, Roberto Barumerli, L. Picinali, M. Geronazzo
{"title":"Classifying Non-Individual Head-Related Transfer Functions with A Computational Auditory Model: Calibration And Metrics","authors":"Rapolas Daugintis, Roberto Barumerli, L. Picinali, M. Geronazzo","doi":"10.1109/ICASSP49357.2023.10095152","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095152","url":null,"abstract":"This study explores the use of a multi-feature Bayesian auditory sound localisation model to classify non-individual head-related transfer functions (HRTFs). Based on predicted sound localisation performance, these are grouped into ‘good’ and ‘bad’, and the ‘best’/‘worst’ is selected from each category. Firstly, we present a greedy algorithm for automated individual calibration of the model based on the individual sound localisation data. We then discuss data analysis of predicted directional localisation errors and present an algorithm for categorising the HRTFs based on the localisation error distributions within a limited range of directions in front of the listener. Finally, we discuss the validity of the classification algorithm when using averaged instead of individual model parameters. This analysis of auditory modelling results aims to provide a perceptual foundation for automated HRTF personalisation techniques for an improved experience of binaural spatial audio technologies.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124185053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Privacy-Enhanced Federated Learning Against Attribute Inference Attack for Speech Emotion Recognition","authors":"Huan Zhao, Haijiao Chen, Yufeng Xiao, Zixing Zhang","doi":"10.1109/ICASSP49357.2023.10095737","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095737","url":null,"abstract":"Federal learning-based (FL) Speech Emotion Recognition (SER) framework aims to protect data privacy when characterizing emotions. However, previous studies have shown that the framework is vulnerable, because curious servers can indirectly infer user private information. To address this challenge, we propose a novel privacy- enhanced SER approach against attribute inference attack. It helps filter sensitive information and attends to highlight emotion features before uploading the shared model updates under the FL. Firstly, a bi-directional recurrent neural network captures the latent representations in sequences to discard partial redundant features. Then, a feature attention mechanism is applied to focus on the salient regions in the latent representations, further hiding emotion-irrelevant attributes. The experimental results show that the introduced model is effective. The attack capability of a gender prediction model is reduced to a chance level while retaining SER performance.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123342836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Electric Load Demand Forecasting with Anchor-Based Forecasting Method","authors":"Maria Tzelepi, P. Nousi, A. Tefas","doi":"10.1109/ICASSP49357.2023.10096754","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096754","url":null,"abstract":"In this paper we deal with the problem of Electric Load Demand Forecasting (ELDF) considering the Greek Energy Market. Motivated by the anchored-based object detection methods, we argue that considering the ELDF task we can define an anchor and transform the problem into predicting the offset instead of predicting the actual load values. The experimental evaluation considering the one-day-ahead forecasting task, validated the effectiveness of the proposed Anchor-based FOREcasting (AFORE) method. The AFORE method achieved significant improvements in terms of mean absolute percentage error under various setups, using different loss functions and model architectures.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121207384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiple Target Measurements: Bayesian Framework for Moving Object Detection in Mimo Radar","authors":"Bastian Eisele, Ali Bereyhi, R. Müller","doi":"10.1109/ICASSP49357.2023.10094649","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10094649","url":null,"abstract":"Utilizing compressive sensing (CS), one can significantly reduce the number of required antenna elements in MIMO radar systems, while preserving a high spatial resolution. Most CS-based studies focus on individual processing of a single set of measurements collected from an stationary scene. In this paper, we propose a new scheme called multiple target measurements (MTM). This scheme uses the target movement to collect multiple sets of measurements from jointly sparse stationary scenes. Invoking approximate message passing, we develop a Bayesian-like iterative algorithm to recover the sparse scenes jointly. Our analytical and numerical investigations demonstrate that MTM can further reduce the array size required to achieve a desired spatial resolution.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114210315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FedSD: A New Federated Learning Structure Used in Non-iid Data","authors":"Minmin Yi, Houchun Ning, Peng Liu","doi":"10.1109/ICASSP49357.2023.10095595","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095595","url":null,"abstract":"One of the most challenging problems in federated learning is the convergence speed problem caused by heterogeneity. We propose a novel structure called FedSD, a new method to accelerate the model convergence. We change the one-stage-cycle iteration structure to a 2-stage-cycle one to get the latest global gradient descent direction which can guide the model training direction. We instantiate algorithms using FedSD to improve the performance of experiments on several public datasets. Our empirical studies validate the excellent performance of FedSD.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114358267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ERBNet: An Effective Representation Based Network for Unbiased Scene Graph Generation","authors":"Wenxing Ma, Tianxiang Hou, Qianji Di, Zhongang Qi, Ying Shan, Hanzi Wang","doi":"10.1109/ICASSP49357.2023.10094727","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10094727","url":null,"abstract":"The scene graph generation (SGG) task has attracted increasing attention in recent years. The goal of SGG is to predict relations between pairs of objects within an image. Due to the long-tailed distribution of the dataset annotations, the performance of SGG is still far from satisfactory. To address the long-tailed problem, existing methods try various ways to conduct unbiased learning. However, we argue that the essence of the long-tailed problem in SGG is that the classifier is seriously affected by the long-tailed data. To handle this issue, we propose a novel network named ERBNet, which contains a relation feature fusion (RFF) encoder to construct effective representations of relations between objects, and a nearest class mean (NCM) classifier to conduct relation prediction based on relation feature similarities. Extensive experimental results show that the proposed ERBNet outperforms several state-of-the-art methods on the challenging Visual Genome dataset.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114620597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Seri: Sketching-Reasoning-Integrating Progressive Workflow for Empathetic Response Generation","authors":"Guanqun Bi, Yanan Cao, Piji Li, Yuqiang Xie, Fang Fang, Zheng Lin","doi":"10.1109/ICASSP49357.2023.10094672","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10094672","url":null,"abstract":"Empathy is a key ability for a human-like dialogue system. Inspired by social psychology, empathy includes both affective and cognitive aspects. Previous works on this topic have merely focused on recognizing emotions or modeling cognition with commonsense knowledge. Nevertheless, the generated results of these works still have a big gap with human-like empathetic responses. In this paper, we propose Seri, a SkEtching-Reasoning-Integrating framework for empathetic response generation. In particular, we define an empathy planner to capture and reason about multi-source information that considers cognition and affection. Further, we introduce a dynamic integrator module that allows the model dynamically select the appropriate information to generate empathetic responses. Experimental results on EmpatheticDialogue show that our method outperforms competitive baselines and generates responses with higher diversity and cognitive empathy levels.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116273842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Audio-Driven High Definetion and Lip-Synchronized Talking Face Generation Based on Face Reenactment","authors":"Xianyu Wang, Yuhan Zhang, Weihua He, Yaoyuan Wang, Minglei Li, Yuchen Wang, Jingyi Zhang, Shunbo Zhou, Ziyang Zhang","doi":"10.1109/icassp49357.2023.10097270","DOIUrl":"https://doi.org/10.1109/icassp49357.2023.10097270","url":null,"abstract":"Generating audio-driven photo-realistic talking face has received intensive attention due to its ability to bring more new human-computer interaction experiences. However, previous works struggled to balance high definition, lip synchronization, and low customization costs, which would degrade the user experience. In this paper, a novel audio-driven talking face generation method was proposed, which subtly converts the problem of improving video definition into the problem of face reenactment to produce both lip-synchronized and high- definition face video. The framework is decoupled, meaning that the same trained model can be used on arbitrary characters and audio without further customizing training for specific people, thus significantly reducing costs. Experiment results show that our proposed method achieves the high video definition, and comparable lip synchronization performance with the existing state-of-the-art methods.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116357213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}