2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)最新文献

筛选
英文 中文
Ambient-Aware Sound Field Translation Using Optimal Spatial Filtering 使用最佳空间滤波的环境感知声场转换
2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632793
Maximilian Kentgens, P. Jax
{"title":"Ambient-Aware Sound Field Translation Using Optimal Spatial Filtering","authors":"Maximilian Kentgens, P. Jax","doi":"10.1109/WASPAA52581.2021.9632793","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632793","url":null,"abstract":"In a previous contribution, we proposed a space-warping-based approach for sound field translation of non-reverberant higher-order Ambisonics signals with applications in spatial audio and virtual reality. In this work, we extend the concept of space warping in order to deal with ambient sound such as reverberation and diffuse noise by using spatially selective filtering. We propose a hard-decision and a soft-decision approach which both make use of the second-order statistics of the signal. The hard-decision variant yields improved performance with respect to the non-adaptive reference for low SNRs and is robust against covariance misestimates. The soft-decision variant is the solution to an optimal spatial filter derivation. It yields optimal performance for known covariances and easily outperforms the hard-decision and reference approaches also for moderate and high SNRs. We further derive expressions for the expected errors and relate our findings to the mathematically related problem of spherical-harmonics-domain noise reduction.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115744944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fast Convergent Method for Active Noise Control Over Spatial Region with Causal Constraint 具有因果约束的空间区域主动噪声控制的快速收敛方法
2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632744
Naoki Murata, Yuhta Takida, T. Magariyachi
{"title":"Fast Convergent Method for Active Noise Control Over Spatial Region with Causal Constraint","authors":"Naoki Murata, Yuhta Takida, T. Magariyachi","doi":"10.1109/WASPAA52581.2021.9632744","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632744","url":null,"abstract":"The aim of spatial active noise control (ANC) is to attenuate unwanted noise over a target region. Methods based on the spherical/circular harmonic expansion of the sound field have been proposed, enabling the control of a particular continuous area. These methods, however, are derived in the frequency domain; therefore, they cannot guarantee the causality of the control filters. On the other hand, time-domain adaptive methods have the problem of slow convergence. We propose a spatial ANC method that guarantees the control filter's causality and achieves fast convergence while controlling the continuous spatial area. The proposed method adopts the objective function of the recursive least squares algorithm and exploits the Markov conjugacy of search directions for fast convergence. Numerical simulations in a room environment indicated the efficacy of the proposed method compared with the conventional multipoint adaptive method.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128186789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kernel Learning for Sound Field Estimation with L1 and L2 Regularizations 基于L1和L2正则化的声场估计核学习
2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-11 DOI: 10.1109/WASPAA52581.2021.9632731
Ryosuke Horiuchi, Shoichi Koyama, Juliano G. C. Ribeiro, Natsuki Ueno, H. Saruwatari
{"title":"Kernel Learning for Sound Field Estimation with L1 and L2 Regularizations","authors":"Ryosuke Horiuchi, Shoichi Koyama, Juliano G. C. Ribeiro, Natsuki Ueno, H. Saruwatari","doi":"10.1109/WASPAA52581.2021.9632731","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632731","url":null,"abstract":"A method to estimate an acoustic field from discrete microphone measurements is proposed. A kernel-interpolation-based method using the kernel function formulated for sound field interpolation has been used in various applications. The kernel function with directional weighting makes it possible to incorporate prior information on source directions to improve estimation accuracy. However, in prior studies, parameters for directional weighting have been empirically determined. We propose a method to optimize these parameters using observation values, which is particularly useful when prior information on source directions is uncertain. The proposed algorithm is based on discretization of the parameters and representation of the kernel function as a weighted sum of sub-kernels. Two types of regularization for the weights, L1 and L2, are investigated. Experimental results indicate that the proposed method achieves higher estimation accuracy than the method without kernel learning.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"22 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125268506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Auto-DSP: Learning to Optimize Acoustic Echo Cancellers 自动dsp:学习优化声学回声消除器
2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-08 DOI: 10.1109/WASPAA52581.2021.9632678
Jonah Casebeer, Nicholas J. Bryan, P. Smaragdis
{"title":"Auto-DSP: Learning to Optimize Acoustic Echo Cancellers","authors":"Jonah Casebeer, Nicholas J. Bryan, P. Smaragdis","doi":"10.1109/WASPAA52581.2021.9632678","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632678","url":null,"abstract":"Adaptive filtering algorithms are commonplace in signal processing and have wide-ranging applications from single-channel denoising to multi -channel acoustic echo cancellation and adaptive beamforming. Such algorithms typically operate via specialized online, iterative optimization methods and have achieved tremendous success, but require expert knowledge, are slow to develop, and are difficult to customize. In our work, we present a new method to automatically learn adaptive filtering update rules directly from data. To do so, we frame adaptive filtering as a differentiable operator and train a learned optimizer to output a gradient descent-based update rule from data via backpropagation through time. We demonstrate our general approach on an acoustic echo cancellation task (single-talk with noise) and show that we can learn high-performing adaptive filters for a variety of common linear and non-linear mul-tidelayed block frequency domain filter architectures. We also find that our learned update rules exhibit fast convergence, can optimize in the presence of nonlinearities, and are robust to acoustic scene changes despite never encountering any during training.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"215 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122708780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Universal Deep Room Acoustics Estimator 通用深室声学估计器
2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-09-29 DOI: 10.1109/WASPAA52581.2021.9632738
P. S. López, Paul Callens, M. Cernak
{"title":"A Universal Deep Room Acoustics Estimator","authors":"P. S. López, Paul Callens, M. Cernak","doi":"10.1109/WASPAA52581.2021.9632738","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632738","url":null,"abstract":"Speech audio quality is subject to degradation caused by an acoustic environment and isotropic ambient and point noises. The environment can lead to decreased speech intelligibility and loss of focus and attention by the listener. Basic acoustic parameters that characterize the environment well are (i) signal-to-noise ratio (SNR), (ii) speech transmission index, (iii) reverberation time, (iv) clarity, and (v) direct-to-reverberant ratio. Except for the SNR, these parameters are usually derived from the Room Impulse Response (RIR) measurements; however, such measurements are often not available. This work presents a universal room acoustic estimator design based on convolutional recurrent neural networks that estimate the acoustic environment measurement blindly and jointly. Our results indicate that the proposed system is robust to non-stationary signal variations and outperforms current state-of-the-art methods.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127805929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Cross-Domain Semi-Supervised Audio Event Classification Using Contrastive Regularization 基于对比正则化的跨域半监督音频事件分类
2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-09-29 DOI: 10.1109/WASPAA52581.2021.9632721
Donmoon Lee, Kyogu Lee
{"title":"Cross-Domain Semi-Supervised Audio Event Classification Using Contrastive Regularization","authors":"Donmoon Lee, Kyogu Lee","doi":"10.1109/WASPAA52581.2021.9632721","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632721","url":null,"abstract":"In this study, we proposed a novel semi-supervised training method that uses unlabeled data with a class distribution that is completely different from the target data or data without a target label. To this end, we introduce a contrastive regularization that is designed to be target task-oriented and trained simultaneously. In addition, we propose an audio mixing based simple augmentation strategy that performed in batch samples. Experimental results validate that the proposed method successfully contributed to the performance improvement, and particularly showed that it has advantages in stable training and generalization.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128121777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Convolutive Prediction for Reverberant Speech Separation 混响语音分离的卷积预测
2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-08-16 DOI: 10.1109/WASPAA52581.2021.9632667
Zhong-Qiu Wang, G. Wichern, Jonathan Le Roux
{"title":"Convolutive Prediction for Reverberant Speech Separation","authors":"Zhong-Qiu Wang, G. Wichern, Jonathan Le Roux","doi":"10.1109/WASPAA52581.2021.9632667","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632667","url":null,"abstract":"We investigate the effectiveness of convolutive prediction, a novel formulation of linear prediction for speech dereverberation, for speaker separation in reverberant conditions. The key idea is to first use a deep neural network (DNN) to estimate the direct-path signal of each speaker, and then identify delayed and decayed copies of the estimated direct-path signal. Such copies are likely due to reverberation, and can be directly removed for dereverberation or used as extra features for another DNN to perform better dereverberation and separation. To identify such copies, we solve a linear regression problem per frequency efficiently in the time-frequency (T-F) domain to estimate the underlying room impulse response (RIR). In the multi-channel extension, we perform minimum variance distortionless response (MVDR) beamforming on the outputs of convolutive prediction. The beamforming and dereverberation results are used as extra features for a second DNN to perform better separation and dereverberation. State-of-the-art results are obtained on the SMS-WSJ corpus.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133325677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Streamwise Gan Vocoder for Wideband Speech Coding at Very Low Bit Rate 一种用于极低比特率下宽带语音编码的流式Gan声码器
2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-08-09 DOI: 10.1109/WASPAA52581.2021.9632750
Ahmed Mustafa, Jan Büthe, Srikanth Korse, Kishan Gupta, Guillaume Fuchs, N. Pia
{"title":"A Streamwise Gan Vocoder for Wideband Speech Coding at Very Low Bit Rate","authors":"Ahmed Mustafa, Jan Büthe, Srikanth Korse, Kishan Gupta, Guillaume Fuchs, N. Pia","doi":"10.1109/WASPAA52581.2021.9632750","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632750","url":null,"abstract":"Recently, GAN vocoders have seen rapid progress in speech synthesis, starting to outperform autoregressive models in perceptual quality with much higher generation speed. However, autoregressive vocoders are still the common choice for neural generation of speech signals coded at very low bit rates. In this paper, we present a GAN vocoder which is able to generate wideband speech waveforms from parameters coded at 1.6 kbit/s. The proposed model is a modified version of the StyleMelGAN vocoder that can run in frame-by-frame manner, making it suitable for streaming applications. The experimental results show that the proposed model significantly outperforms prior autoregressive vocoders like LPC-Net for very low bit rate speech coding, with computational complexity of about 5 GMACs, providing a new state of the art in this domain. Moreover, this streamwise adversarial vocoder delivers quality competitive to advanced speech codecs such as EVS at 5.9 kbit/s on clean speech, which motivates further usage of feedforward fully-convolutional models for low bit rate speech coding.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115059716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Multi-Head Relevance Weighting Framework for Learning Raw Waveform Audio Representations 一种学习原始波形音频表示的多头相关加权框架
2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-07-30 DOI: 10.1109/WASPAA52581.2021.9632708
Debottam Dutta, Purvi Agrawal, Sriram Ganapathy
{"title":"A Multi-Head Relevance Weighting Framework for Learning Raw Waveform Audio Representations","authors":"Debottam Dutta, Purvi Agrawal, Sriram Ganapathy","doi":"10.1109/WASPAA52581.2021.9632708","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632708","url":null,"abstract":"In this work, we propose a multi-head relevance weighting framework to learn audio representations from raw waveforms. The audio waveform, split into windows of short-duration, are processed with a 1-D convolutional layer of cosine modulated Gaussian filters acting as a learnable filterbank. The key novelty of the proposed framework is the introduction of multi-head relevance on the learnt filterbank representations. Each head of the relevance network is modelled as a separate sub-network. These heads perform representation enhancement by generating weight masks for different parts of the time-frequency representation learnt by the parametric acoustic filterbank layer. The relevance weighted representations are fed to a neural classifier and the whole system is trained jointly for the audio classification objective. Experiments are performed on the DCASE2020 Task 1A challenge as well as the Urban Sound Classification (USC) tasks. In these experiments, the proposed approach yields relative improvements of 10% and 23% respectively for the DCASE2020 and USC datasets over the mel-spectrogram baseline. Also, the analysis of multi-head relevance weights provides insights on the learned representations.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116957189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Blind Room Parameter Estimation Using Multiple Multichannel Speech Recordings 基于多通道语音记录的盲室参数估计
2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-07-29 DOI: 10.1109/WASPAA52581.2021.9632778
Prerak Srivastava, Antoine Deleforge, E. Vincent
{"title":"Blind Room Parameter Estimation Using Multiple Multichannel Speech Recordings","authors":"Prerak Srivastava, Antoine Deleforge, E. Vincent","doi":"10.1109/WASPAA52581.2021.9632778","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632778","url":null,"abstract":"Knowing the geometrical and acoustical parameters of a room may benefit applications such as audio augmented reality, speech dereverberation or audio forensics. In this paper, we study the problem of jointly estimating the total surface area, the volume, as well as the frequency-dependent reverberation time and mean surface absorption of a room in a blind fashion, based on two-channel noisy speech recordings from multiple, unknown source-receiver positions. A novel convolutional neural network architecture leveraging both single- and inter-channel cues is proposed and trained on a large, realistic simulated dataset. Results on both simulated and real data show that using multiple observations in one room significantly reduces estimation errors and variances on all target quantities, and that using two channels helps the estimation of surface and volume. The proposed model outperforms a recently proposed blind volume estimation method on the considered datasets.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125125390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信