ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
Sensor Selection for Angle of Arrival Estimation Based on the Two-Target Cramér-Rao Bound 基于双目标cram<s:1> - rao界的到达角估计传感器选择
C. Kokke, M. Coutiño, L. Anitori, R. Heusdens, G. Leus
{"title":"Sensor Selection for Angle of Arrival Estimation Based on the Two-Target Cramér-Rao Bound","authors":"C. Kokke, M. Coutiño, L. Anitori, R. Heusdens, G. Leus","doi":"10.1109/ICASSP49357.2023.10094942","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10094942","url":null,"abstract":"Sensor selection is a useful method to help reduce data throughput, as well as computational, power, and hardware requirements, while still maintaining acceptable performance. Although minimizing the Cramér-Rao bound has been adopted previously for sparse sensing, it did not consider multiple targets and unknown source models. In this work, we propose to tackle the sensor selection problem for angle of arrival estimation using the worst-case Cramér-Rao bound of two uncorrelated sources. To do so, we cast the problem as a convex semi-definite program and retrieve the binary selection by randomized rounding. Through numerical examples related to a linear array, we illustrate the proposed method and show that it leads to the natural selection of elements at the edges plus the center of the linear array. This contrasts with the typical solutions obtained from minimizing the single-target Cramér-Rao bound.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123958254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Stargan-vc Based Cross-Domain Data Augmentation for Speaker Verification 基于Stargan-vc的说话人验证跨域数据增强
Hang-Rui Hu, Yan Song, Jian-Tao Zhang, Lirong Dai, I. Mcloughlin, Zhu Zhuo, Yujie Zhou, Yu-Hong Li, Hui Xue
{"title":"Stargan-vc Based Cross-Domain Data Augmentation for Speaker Verification","authors":"Hang-Rui Hu, Yan Song, Jian-Tao Zhang, Lirong Dai, I. Mcloughlin, Zhu Zhuo, Yujie Zhou, Yu-Hong Li, Hui Xue","doi":"10.1109/ICASSP49357.2023.10094698","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10094698","url":null,"abstract":"Automatic speaker verification (ASV) faces domain shift caused by the mismatch of intrinsic and extrinsic factors, such as recording device and speaking style, in real-world applications, which leads to severe performance degradation. Since single-speaker multi-condition (SSMC) data is difficult to collect in practice, existing domain adaptation methods are hard to ensure the feature consistency of the same class but different domains. To this end, we propose a cross-domain data generation method to obtain a domain-invariant ASV system. Inspired by voice conversion (VC) task, a StarGAN based generative model first learns cross-domain mappings from SSMC data, and then generates missing domain data for all speakers, thus increasing the intra-class diversity of the training set. Considering the difference between ASV and VC task, we renovate the corresponding training objectives and network structure to make the adaptation task-specific. Evaluations on achieve a relative performance improvement of about 5-8% over the baseline in terms of minDCF and EER, outperforming the CNSRC winner’s system of the equivalent scale.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124002604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classifying Non-Individual Head-Related Transfer Functions with A Computational Auditory Model: Calibration And Metrics 用计算听觉模型对非个体头部相关传递函数进行分类:校准和度量
Rapolas Daugintis, Roberto Barumerli, L. Picinali, M. Geronazzo
{"title":"Classifying Non-Individual Head-Related Transfer Functions with A Computational Auditory Model: Calibration And Metrics","authors":"Rapolas Daugintis, Roberto Barumerli, L. Picinali, M. Geronazzo","doi":"10.1109/ICASSP49357.2023.10095152","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095152","url":null,"abstract":"This study explores the use of a multi-feature Bayesian auditory sound localisation model to classify non-individual head-related transfer functions (HRTFs). Based on predicted sound localisation performance, these are grouped into ‘good’ and ‘bad’, and the ‘best’/‘worst’ is selected from each category. Firstly, we present a greedy algorithm for automated individual calibration of the model based on the individual sound localisation data. We then discuss data analysis of predicted directional localisation errors and present an algorithm for categorising the HRTFs based on the localisation error distributions within a limited range of directions in front of the listener. Finally, we discuss the validity of the classification algorithm when using averaged instead of individual model parameters. This analysis of auditory modelling results aims to provide a perceptual foundation for automated HRTF personalisation techniques for an improved experience of binaural spatial audio technologies.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124185053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Privacy-Enhanced Federated Learning Against Attribute Inference Attack for Speech Emotion Recognition 针对属性推理攻击的隐私增强联邦学习语音情感识别
Huan Zhao, Haijiao Chen, Yufeng Xiao, Zixing Zhang
{"title":"Privacy-Enhanced Federated Learning Against Attribute Inference Attack for Speech Emotion Recognition","authors":"Huan Zhao, Haijiao Chen, Yufeng Xiao, Zixing Zhang","doi":"10.1109/ICASSP49357.2023.10095737","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095737","url":null,"abstract":"Federal learning-based (FL) Speech Emotion Recognition (SER) framework aims to protect data privacy when characterizing emotions. However, previous studies have shown that the framework is vulnerable, because curious servers can indirectly infer user private information. To address this challenge, we propose a novel privacy- enhanced SER approach against attribute inference attack. It helps filter sensitive information and attends to highlight emotion features before uploading the shared model updates under the FL. Firstly, a bi-directional recurrent neural network captures the latent representations in sequences to discard partial redundant features. Then, a feature attention mechanism is applied to focus on the salient regions in the latent representations, further hiding emotion-irrelevant attributes. The experimental results show that the introduced model is effective. The attack capability of a gender prediction model is reduced to a chance level while retaining SER performance.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123342836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Electric Load Demand Forecasting with Anchor-Based Forecasting Method 基于锚点的电力负荷需求预测改进方法
Maria Tzelepi, P. Nousi, A. Tefas
{"title":"Improving Electric Load Demand Forecasting with Anchor-Based Forecasting Method","authors":"Maria Tzelepi, P. Nousi, A. Tefas","doi":"10.1109/ICASSP49357.2023.10096754","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096754","url":null,"abstract":"In this paper we deal with the problem of Electric Load Demand Forecasting (ELDF) considering the Greek Energy Market. Motivated by the anchored-based object detection methods, we argue that considering the ELDF task we can define an anchor and transform the problem into predicting the offset instead of predicting the actual load values. The experimental evaluation considering the one-day-ahead forecasting task, validated the effectiveness of the proposed Anchor-based FOREcasting (AFORE) method. The AFORE method achieved significant improvements in terms of mean absolute percentage error under various setups, using different loss functions and model architectures.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121207384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multiple Target Measurements: Bayesian Framework for Moving Object Detection in Mimo Radar 多目标测量:Mimo雷达运动目标检测的贝叶斯框架
Bastian Eisele, Ali Bereyhi, R. Müller
{"title":"Multiple Target Measurements: Bayesian Framework for Moving Object Detection in Mimo Radar","authors":"Bastian Eisele, Ali Bereyhi, R. Müller","doi":"10.1109/ICASSP49357.2023.10094649","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10094649","url":null,"abstract":"Utilizing compressive sensing (CS), one can significantly reduce the number of required antenna elements in MIMO radar systems, while preserving a high spatial resolution. Most CS-based studies focus on individual processing of a single set of measurements collected from an stationary scene. In this paper, we propose a new scheme called multiple target measurements (MTM). This scheme uses the target movement to collect multiple sets of measurements from jointly sparse stationary scenes. Invoking approximate message passing, we develop a Bayesian-like iterative algorithm to recover the sparse scenes jointly. Our analytical and numerical investigations demonstrate that MTM can further reduce the array size required to achieve a desired spatial resolution.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114210315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedSD: A New Federated Learning Structure Used in Non-iid Data FedSD:用于非id数据的新型联邦学习结构
Minmin Yi, Houchun Ning, Peng Liu
{"title":"FedSD: A New Federated Learning Structure Used in Non-iid Data","authors":"Minmin Yi, Houchun Ning, Peng Liu","doi":"10.1109/ICASSP49357.2023.10095595","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095595","url":null,"abstract":"One of the most challenging problems in federated learning is the convergence speed problem caused by heterogeneity. We propose a novel structure called FedSD, a new method to accelerate the model convergence. We change the one-stage-cycle iteration structure to a 2-stage-cycle one to get the latest global gradient descent direction which can guide the model training direction. We instantiate algorithms using FedSD to improve the performance of experiments on several public datasets. Our empirical studies validate the excellent performance of FedSD.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114358267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ERBNet: An Effective Representation Based Network for Unbiased Scene Graph Generation 一种有效的基于表示的无偏场景图生成网络
Wenxing Ma, Tianxiang Hou, Qianji Di, Zhongang Qi, Ying Shan, Hanzi Wang
{"title":"ERBNet: An Effective Representation Based Network for Unbiased Scene Graph Generation","authors":"Wenxing Ma, Tianxiang Hou, Qianji Di, Zhongang Qi, Ying Shan, Hanzi Wang","doi":"10.1109/ICASSP49357.2023.10094727","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10094727","url":null,"abstract":"The scene graph generation (SGG) task has attracted increasing attention in recent years. The goal of SGG is to predict relations between pairs of objects within an image. Due to the long-tailed distribution of the dataset annotations, the performance of SGG is still far from satisfactory. To address the long-tailed problem, existing methods try various ways to conduct unbiased learning. However, we argue that the essence of the long-tailed problem in SGG is that the classifier is seriously affected by the long-tailed data. To handle this issue, we propose a novel network named ERBNet, which contains a relation feature fusion (RFF) encoder to construct effective representations of relations between objects, and a nearest class mean (NCM) classifier to conduct relation prediction based on relation feature similarities. Extensive experimental results show that the proposed ERBNet outperforms several state-of-the-art methods on the challenging Visual Genome dataset.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114620597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Seri: Sketching-Reasoning-Integrating Progressive Workflow for Empathetic Response Generation 系列:素描-推理-共情反应生成的集成渐进式工作流程
Guanqun Bi, Yanan Cao, Piji Li, Yuqiang Xie, Fang Fang, Zheng Lin
{"title":"Seri: Sketching-Reasoning-Integrating Progressive Workflow for Empathetic Response Generation","authors":"Guanqun Bi, Yanan Cao, Piji Li, Yuqiang Xie, Fang Fang, Zheng Lin","doi":"10.1109/ICASSP49357.2023.10094672","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10094672","url":null,"abstract":"Empathy is a key ability for a human-like dialogue system. Inspired by social psychology, empathy includes both affective and cognitive aspects. Previous works on this topic have merely focused on recognizing emotions or modeling cognition with commonsense knowledge. Nevertheless, the generated results of these works still have a big gap with human-like empathetic responses. In this paper, we propose Seri, a SkEtching-Reasoning-Integrating framework for empathetic response generation. In particular, we define an empathy planner to capture and reason about multi-source information that considers cognition and affection. Further, we introduce a dynamic integrator module that allows the model dynamically select the appropriate information to generate empathetic responses. Experimental results on EmpatheticDialogue show that our method outperforms competitive baselines and generates responses with higher diversity and cognitive empathy levels.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116273842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Audio-Driven High Definetion and Lip-Synchronized Talking Face Generation Based on Face Reenactment 基于人脸再现的音频驱动高清唇同步说话人脸生成
Xianyu Wang, Yuhan Zhang, Weihua He, Yaoyuan Wang, Minglei Li, Yuchen Wang, Jingyi Zhang, Shunbo Zhou, Ziyang Zhang
{"title":"Audio-Driven High Definetion and Lip-Synchronized Talking Face Generation Based on Face Reenactment","authors":"Xianyu Wang, Yuhan Zhang, Weihua He, Yaoyuan Wang, Minglei Li, Yuchen Wang, Jingyi Zhang, Shunbo Zhou, Ziyang Zhang","doi":"10.1109/icassp49357.2023.10097270","DOIUrl":"https://doi.org/10.1109/icassp49357.2023.10097270","url":null,"abstract":"Generating audio-driven photo-realistic talking face has received intensive attention due to its ability to bring more new human-computer interaction experiences. However, previous works struggled to balance high definition, lip synchronization, and low customization costs, which would degrade the user experience. In this paper, a novel audio-driven talking face generation method was proposed, which subtly converts the problem of improving video definition into the problem of face reenactment to produce both lip-synchronized and high- definition face video. The framework is decoupled, meaning that the same trained model can be used on arbitrary characters and audio without further customizing training for specific people, thus significantly reducing costs. Experiment results show that our proposed method achieves the high video definition, and comparable lip synchronization performance with the existing state-of-the-art methods.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116357213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信