Keisuke Kimura, Shoichi Koyama, Natsuki Ueno, H. Saruwatari
{"title":"Mean-Square-Error-Based Secondary Source Placement in Sound Field Synthesis with Prior Information on Desired Field","authors":"Keisuke Kimura, Shoichi Koyama, Natsuki Ueno, H. Saruwatari","doi":"10.1109/WASPAA52581.2021.9632799","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632799","url":null,"abstract":"A method of optimizing secondary source placement in sound field synthesis is proposed. Such an optimization method will be useful when the allowable placement region and available number of loudspeakers are limited. We formulate a mean-square-error-based cost function, incorporating the statistical properties of possible desired sound fields, for general linear-least-squares-based sound field synthesis methods, including pressure matching and (weighted) mode matching, whereas most of the current methods are applicable only to the pressure-matching method. An efficient greedy algorithm for minimizing the proposed cost function is also derived. Numerical experiments indicated that a high reproduction accuracy can be achieved by the placement optimized by the proposed method compared with the empirically used regular placement.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126632598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wangyou Zhang, Jing Shi, Chenda Li, Shinji Watanabe, Y. Qian
{"title":"Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions","authors":"Wangyou Zhang, Jing Shi, Chenda Li, Shinji Watanabe, Y. Qian","doi":"10.1109/WASPAA52581.2021.9632720","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632720","url":null,"abstract":"The deep learning based time-domain models, e.g. Conv-TasNet, have shown great potential in both single-channel and multi-channel speech enhancement. However, many experiments on the time-domain speech enhancement model are done in simulated conditions, and it is not well studied whether the good performance can generalize to real-world scenarios. In this paper, we aim to provide an insightful investigation of applying multi-channel Conv-TasNet based speech enhancement to both simulation and real data. Our preliminary experiments show a large performance gap between the two conditions in terms of the ASR performance. Several approaches are applied to close this gap, including the integration of multi-channel Conv-TasNet into the beamforming model with various strategies, and the joint training of speech enhancement and speech recognition models. Our experiments on the CHiME-4 corpus show that our proposed approaches can greatly reduce the speech recognition performance discrepancy between simulation and real data, while preserving the strong speech enhancement capability in the frontend.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127033692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction of Missing Frequency Response Functions Through Deep Image Prior","authors":"R. Malvermi, F. Antonacci, A. Sarti, R. Corradi","doi":"10.1109/WASPAA52581.2021.9632759","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632759","url":null,"abstract":"Vibration analysis is crucial when designing and monitoring resonant structures. The characterization of vibrational properties in mechanical systems, e.g. machinery or musical instruments, can indeed detect noise sources and damages. Several methods can retrieve these parameters starting from a set of measurements. The level of detail in the estimate mostly depends on the amount and distribution of points acquired over space. A potential issue for these techniques consists in the presence of regions over the object where sensors cannot be attached. In this case, an interpolation scheme with a suitable prior of the data model should be devised. We propose here to predict the missing vibrational data within the framework of image inpainting and apply a fully data-driven method based on Deep Image Prior, which allows to capture the prior inside data without the need of a dataset. The performance is assessed in the case of violin top plates. The proposed method proved to better predict data, in particular resonances, for points close to the boundary, whereas a baseline based on Thin Plate Splines fails, due to the reduced number of available samples.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125900784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of Frequency-Dependent Behavior of Room Reflections Using Spherical Microphone Measurements & Von Mises-Fisher Clustering","authors":"Amy Bastine, T. Abhayapala, J. Zhang","doi":"10.1109/WASPAA52581.2021.9632706","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632706","url":null,"abstract":"This paper presents a room acoustic analysis tool capable of power response generation and directional characterization of room reflections across different frequencies using spherical microphone array measurements. The method exploits the spatial correlation between the frequency-dependent spherical harmonic coefficients of the reverberant soundfield and extracts its statistical features using von Mises-Fisher (vMF) clustering. We use this tool to examine the acoustic response of a small and a large room to achieve a profound understanding of the frequency-related variations in the directional characteristics of room reflections. In comparison to the eigen-beam multiple signal classification (EB-MUSIC) method, the proposed technique incorporates a more realistic room response over a broader frequency range. The experimental observations prove the potential of the proposed tool in determining the frequency-dependent room acoustic parameters and can lead to the design of smarter room acoustic treatment solutions.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"297 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123744202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stochastic Reverberation Model with a Frequency Dependent Attenuation","authors":"Achille Aknin, Roland Badeau","doi":"10.1109/WASPAA52581.2021.9632792","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632792","url":null,"abstract":"In various audio signal processing applications, such as source separation and dereverberation, accurate mathematical modeling of both source signals and room reverberation is needed to properly describe the audio data. In a previous paper, we introduced a stochastic room impulse response model based on the image source principle, and we proposed an expectation-maximization algorithm that was able to efficiently estimate the model parameters in various experimental settings. This paper aims to extend the model in order to account for the dependency of the exponential decay over frequency, due to the walls usually absorbing less energy at low frequencies than at high frequencies. Our experimental results show that this refinement of the model is able to generate realistic room impulse responses, that are perceptively very close to the original ones.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128245636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rendering of Source Spread for Arbitrary Playback Setups Based on Spatial Covariance Matching","authors":"L. McCormack, A. Politis, V. Pulkki","doi":"10.1109/WASPAA52581.2021.9632724","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632724","url":null,"abstract":"This paper proposes an algorithm for rendering spread sound sources, which are mutually incoherent across their extents, over arbitrary playback formats. The approach involves first generating signals corresponding to the centre of the spread source for the intended playback setup, along with decorrelated variants, followed by defining a diffuse spatial covariance matrix for the confined target spreading area. The mixing matrices required to combine these signals, in a manner whereby the resulting output signals exhibit the target inter-channel relationships for an incoherently spread source, are computed based on an optimised solution which is constrained to preserve signal fidelity. The proposed solution is evaluated in the context of producing extended sound sources for binaural playback. Objective perceptual metrics are computed and shown to be comparable to those derived from an ideal incoherently spread reference. Signal distortion measures are also calculated for speech, musical, and ambience recordings, which indicate higher signal fidelity produced by the proposed constrained spatial covariance matching solution, compared to an unconstrained alternative. These improvements in signal fidelity are further demonstrated by the provided audio examples and open-source audio plug-in.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128869120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giorgia Cantisani, A. Ozerov, S. Essid, G. Richard
{"title":"User-Guided One-Shot Deep Model Adaptation for Music Source Separation","authors":"Giorgia Cantisani, A. Ozerov, S. Essid, G. Richard","doi":"10.1109/WASPAA52581.2021.9632717","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632717","url":null,"abstract":"Music source separation is the task of isolating individual instruments which are mixed in a musical piece. This task is particularly challenging, and even state-of-the-art models can hardly generalize to unseen test data. Nevertheless, prior knowledge about individual sources can be used to better adapt a generic source separation model to the observed signal. In this work, we propose to exploit a temporal segmentation provided by the user, that indicates when each instrument is active, in order to fine-tune a pre-trained deep model for source separation and adapt it to one specific mixture. This paradigm can be referred to as user-driven one-shot deep model adaptation for music source separation, as the adaptation acts on the target song instance only. Our results are promising and show that state-of-the-art source separation models have large margins of improvement especially for those instruments which are underrepresented in the training data.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125073100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel T. Jones, D. Sharma, S. Kruchinin, P. Naylor
{"title":"Spatial Coding for Microphone Arrays Using Ipnlms-Based RTF Estimation","authors":"Daniel T. Jones, D. Sharma, S. Kruchinin, P. Naylor","doi":"10.1109/WASPAA52581.2021.9632747","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632747","url":null,"abstract":"We propose a method for encoding multichannel microphone array signals and show that our proposed algorithm can operate effectively at very low bitrates. Our approach leverages the high interchannel correlations that arise from the close proximity of microphones in an array to compactly represent the signals. An $M$ channel microphone array signal is encoded as one reference signal and $M-1$ Relative Transfer Functions (RTFs). When the RTFs require updating only infrequently, a significant reduction in data-rate is obtained. Applications of interest include cloud-based beamforming and End-to-End Automatic Speech Recognition (ASR) systems. The efficiency of this encoding enables multichannel audio to be transmitted to the cloud at very low bitrates. A system has been developed that estimates, and periodically updates, the RTFs between each channel of the array and a chosen reference channel using an Improved Proportionate Normalized Least Mean Squares (IPNLMS) adaptive filter. The proposed system is experimentally evaluated in comparison with the Opus codec. It achieves equal ΔPESQ performance with a data-rate reduction of up to 90% and un-degraded Word Error Rate (WER) down to bitrates as low as 3.3 kbps.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127593280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spherical Array Based Drone Noise Measurements and Modelling for Drone Noise Reduction via Propeller Phase Control","authors":"Hanwen Bi, Fei Ma, T. Abhayapala, P. Samarasinghe","doi":"10.1109/WASPAA52581.2021.9632719","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632719","url":null,"abstract":"Drone noise is increasingly becoming an annoying problem as they are widely used in everyday applications. This paper investigates the problem of controlling farfield drone noise by manipulating relative phase of propellers. The methodology includes (i) measurement of the nearfield propeller noise using a specially designed open spherical array, (ii) development of extrapolation method to transform nearfield noise to a farfield target region, and (iii) simulation of farfield noise field with varying propeller relative phases. We further investigate the influence of drone configuration of phase controlled noise reduction for a farfield target region, and show that −6.8 dB noise reduction can be achieved at the blade passage frequencies. The analysis of residual noise shows the potential benefit of combining phase control with active noise control.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126642235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial Subtraction of Reflections from Room Impulse Responses Measured with a Spherical Microphone Array","authors":"T. Deppisch, J. Ahrens, S. V. A. Garí, P. Calamia","doi":"10.1109/WASPAA52581.2021.9632764","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632764","url":null,"abstract":"We propose a method for the decomposition of measured directional room impulse responses (DRIRs) into prominent reflections and a residual. The method comprises obtaining a fingerprint of the time-frequency signal that a given reflection carries, imposing this time-frequency fingerprint on a plane-wave prototype that exhibits the same propagation direction as the reflection, and finally subtracting this plane-wave prototype from the DRIR. Our main contributions are the formulation of the problem as a spatial subtraction as well as the incorporation of order truncation, spatial aliasing and regularization of the radial filters into the definition of the underlying beamforming problem. We demonstrate, based on simulated as well as measured array impulse responses, that our method increases the accuracy of the model of the reflection under test and consequently decreases the energy of the residual that remains in a measured DRIR after the spatial subtraction.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116847238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}