2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)最新文献_第2页

Mean-Square-Error-Based Secondary Source Placement in Sound Field Synthesis with Prior Information on Desired Field 声场合成中基于均方误差的次声源放置方法

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632799

Keisuke Kimura, Shoichi Koyama, Natsuki Ueno, H. Saruwatari

引用次数: 1

Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions 缩小真实和仿真条件下时域多通道语音增强的差距

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632720

Wangyou Zhang, Jing Shi, Chenda Li, Shinji Watanabe, Y. Qian

{"title":"Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions","authors":"Wangyou Zhang, Jing Shi, Chenda Li, Shinji Watanabe, Y. Qian","doi":"10.1109/WASPAA52581.2021.9632720","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632720","url":null,"abstract":"The deep learning based time-domain models, e.g. Conv-TasNet, have shown great potential in both single-channel and multi-channel speech enhancement. However, many experiments on the time-domain speech enhancement model are done in simulated conditions, and it is not well studied whether the good performance can generalize to real-world scenarios. In this paper, we aim to provide an insightful investigation of applying multi-channel Conv-TasNet based speech enhancement to both simulation and real data. Our preliminary experiments show a large performance gap between the two conditions in terms of the ASR performance. Several approaches are applied to close this gap, including the integration of multi-channel Conv-TasNet into the beamforming model with various strategies, and the joint training of speech enhancement and speech recognition models. Our experiments on the CHiME-4 corpus show that our proposed approaches can greatly reduce the speech recognition performance discrepancy between simulation and real data, while preserving the strong speech enhancement capability in the frontend.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127033692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Prediction of Missing Frequency Response Functions Through Deep Image Prior 利用深度图像先验预测缺失频响函数

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632759

R. Malvermi, F. Antonacci, A. Sarti, R. Corradi

{"title":"Prediction of Missing Frequency Response Functions Through Deep Image Prior","authors":"R. Malvermi, F. Antonacci, A. Sarti, R. Corradi","doi":"10.1109/WASPAA52581.2021.9632759","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632759","url":null,"abstract":"Vibration analysis is crucial when designing and monitoring resonant structures. The characterization of vibrational properties in mechanical systems, e.g. machinery or musical instruments, can indeed detect noise sources and damages. Several methods can retrieve these parameters starting from a set of measurements. The level of detail in the estimate mostly depends on the amount and distribution of points acquired over space. A potential issue for these techniques consists in the presence of regions over the object where sensors cannot be attached. In this case, an interpolation scheme with a suitable prior of the data model should be devised. We propose here to predict the missing vibrational data within the framework of image inpainting and apply a fully data-driven method based on Deep Image Prior, which allows to capture the prior inside data without the need of a dataset. The performance is assessed in the case of violin top plates. The proposed method proved to better predict data, in particular resonances, for points close to the boundary, whereas a baseline based on Thin Plate Splines fails, due to the reduced number of available samples.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125900784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Analysis of Frequency-Dependent Behavior of Room Reflections Using Spherical Microphone Measurements & Von Mises-Fisher Clustering 利用球形麦克风测量和Von Mises-Fisher聚类分析房间反射的频率依赖行为

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632706

Amy Bastine, T. Abhayapala, J. Zhang

引用次数: 0

Stochastic Reverberation Model with a Frequency Dependent Attenuation 具有频率相关衰减的随机混响模型

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632792

Achille Aknin, Roland Badeau

引用次数: 3

Rendering of Source Spread for Arbitrary Playback Setups Based on Spatial Covariance Matching 基于空间协方差匹配的任意播放设置的源扩展渲染

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632724

L. McCormack, A. Politis, V. Pulkki

{"title":"Rendering of Source Spread for Arbitrary Playback Setups Based on Spatial Covariance Matching","authors":"L. McCormack, A. Politis, V. Pulkki","doi":"10.1109/WASPAA52581.2021.9632724","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632724","url":null,"abstract":"This paper proposes an algorithm for rendering spread sound sources, which are mutually incoherent across their extents, over arbitrary playback formats. The approach involves first generating signals corresponding to the centre of the spread source for the intended playback setup, along with decorrelated variants, followed by defining a diffuse spatial covariance matrix for the confined target spreading area. The mixing matrices required to combine these signals, in a manner whereby the resulting output signals exhibit the target inter-channel relationships for an incoherently spread source, are computed based on an optimised solution which is constrained to preserve signal fidelity. The proposed solution is evaluated in the context of producing extended sound sources for binaural playback. Objective perceptual metrics are computed and shown to be comparable to those derived from an ideal incoherently spread reference. Signal distortion measures are also calculated for speech, musical, and ambience recordings, which indicate higher signal fidelity produced by the proposed constrained spatial covariance matching solution, compared to an unconstrained alternative. These improvements in signal fidelity are further demonstrated by the provided audio examples and open-source audio plug-in.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128869120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

User-Guided One-Shot Deep Model Adaptation for Music Source Separation 用户引导的单镜头深度模型自适应音乐源分离

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632717

Giorgia Cantisani, A. Ozerov, S. Essid, G. Richard

引用次数: 2

Spatial Coding for Microphone Arrays Using Ipnlms-Based RTF Estimation 基于ipnlms的RTF估计的麦克风阵列空间编码

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632747

Daniel T. Jones, D. Sharma, S. Kruchinin, P. Naylor

{"title":"Spatial Coding for Microphone Arrays Using Ipnlms-Based RTF Estimation","authors":"Daniel T. Jones, D. Sharma, S. Kruchinin, P. Naylor","doi":"10.1109/WASPAA52581.2021.9632747","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632747","url":null,"abstract":"We propose a method for encoding multichannel microphone array signals and show that our proposed algorithm can operate effectively at very low bitrates. Our approach leverages the high interchannel correlations that arise from the close proximity of microphones in an array to compactly represent the signals. An $M$ channel microphone array signal is encoded as one reference signal and $M-1$ Relative Transfer Functions (RTFs). When the RTFs require updating only infrequently, a significant reduction in data-rate is obtained. Applications of interest include cloud-based beamforming and End-to-End Automatic Speech Recognition (ASR) systems. The efficiency of this encoding enables multichannel audio to be transmitted to the cloud at very low bitrates. A system has been developed that estimates, and periodically updates, the RTFs between each channel of the array and a chosen reference channel using an Improved Proportionate Normalized Least Mean Squares (IPNLMS) adaptive filter. The proposed system is experimentally evaluated in comparison with the Opus codec. It achieves equal ΔPESQ performance with a data-rate reduction of up to 90% and un-degraded Word Error Rate (WER) down to bitrates as low as 3.3 kbps.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127593280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Spherical Array Based Drone Noise Measurements and Modelling for Drone Noise Reduction via Propeller Phase Control 基于球面阵的无人机噪声测量及螺旋桨相位控制降噪建模

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632719

Hanwen Bi, Fei Ma, T. Abhayapala, P. Samarasinghe

引用次数: 7

Spatial Subtraction of Reflections from Room Impulse Responses Measured with a Spherical Microphone Array 用球形麦克风阵列测量房间脉冲响应反射的空间减法

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632764

T. Deppisch, J. Ahrens, S. V. A. Garí, P. Calamia

引用次数: 4