{"title":"Internal Time Delay Calibration of Rigid Spherical Microphone Arrays for Multi-Perspective 6DoF Audio Recordings","authors":"Ege Erdem, Orhun Olgun, H. Hacıhabiboğlu","doi":"10.1109/WASPAA52581.2021.9632715","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632715","url":null,"abstract":"Recording navigable or six-degrees-of-freedom (6DoF) audio requires elaborate setups involving multiple microphone arrays capable of recording higher-order Ambisonics (HOA) signals. Rigid spherical microphone arrays (RSMAs) are commonly used for recording HOA. When a number of such arrays are positioned to cover a navigable area, several problems need to be solved: position calibration, orientation calibration, and time delay alignment. There exist several methods addressing each of these problems for arrays comprising pressure microphones. However, these solutions are not directly applicable to a recording setup including multiple RSMAs due to multiple scattering among the spheres. We propose an internal time-delay calibration procedure in the multipole expansion domain for multi-perspective 6DoF audio recording setups comprising multiple RSMAs. We demonstrate the utility of the method via numerical simulations.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128289475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"2D Multizone Sound Field Synthesis with Interior-Exterior Ambisonics","authors":"T. Okamoto","doi":"10.1109/WASPAA52581.2021.9632736","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632736","url":null,"abstract":"This paper presents a two-dimensional multizone sound field synthesis method based on sound field separation and interior-exterior higher-order Ambisonics (HOA) using two circular loudspeaker arrays. In the conventional methods using a circular loudspeaker array, multiple target local sound zones are represented as a global interior sound field. However, the control accuracy of the conventional methods is strongly dependent on the positional relationship between the wavefront direction in a bright zone and other dark zones. This is known as the occlusion problem. On the other hand, the global sound field is defined as a mixture of the interior and exterior sound fields based on sound field separation in the proposed method. The separated global interior and exterior sound fields are then simultaneously synthesized via interior-exterior HOA using two circular loudspeaker arrays with a cylindrical baffle to avoid the forbidden frequency problem in exterior HOA. The results of computer simulations demonstrate that the proposed method using the two circular arrays can successfully avoid the occlusion problem and realize higher control accuracy than the conventional methods using a single circular array.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129013333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Differentiable Tracking-Based Training of Deep Learning Sound Source Localizers","authors":"Sharath Adavanne, A. Politis, T. Virtanen","doi":"10.1109/WASPAA52581.2021.9632773","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632773","url":null,"abstract":"Data-based and learning-based sound source localization (SSL) has shown promising results in challenging conditions, and is commonly set as a classification or a regression problem. Regression-based approaches have certain advantages over classification-based, such as continuous direction-of-arrival estimation of static and moving sources. However, multi-source scenarios require multiple regressors without a clear training strategy up-to-date, that does not rely on auxiliary information such as simultaneous sound classification. We investigate end-to-end training of such methods with a technique recently proposed for video object detectors, adapted to the SSL setting. A differentiable network is constructed that can be plugged to the output of the localizer to solve the optimal assignment between predictions and references, optimizing directly the popular CLEAR-MOT tracking metrics. Results indicate large improvements over directly optimizing mean squared errors, in terms of localization error, detection metrics, and tracking capabilities.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128456893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Polynomial Matrix Eigenvalue Decomposition-Based Source Separation Using Informed Spherical Microphone Arrays","authors":"Vincent W. Neo, C. Evers, P. Naylor","doi":"10.1109/WASPAA52581.2021.9632722","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632722","url":null,"abstract":"Audio source separation is essential for many applications such as hearing aids, telecommunications, and robot audition. Subspace decomposition approaches using polynomial matrix eigenvalue decomposition (PEVD) algorithms applied to the microphone signals, or lower-dimension eigenbeams for spherical microphone arrays, are effective for speech enhancement. In this work, we extend the work from speech enhancement and propose a PEVD subspace algorithm that uses eigenbeams for source separation. The proposed PEVD-based source separation approach performs comparably with state-of-the-art algorithms, such as those based on independent component analysis (ICA) and multi-channel non-negative matrix factorization (MNMF). Informal listening examples also indicate that our method does not introduce any audible artifacts.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128740536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spherical Harmonic Decomposition of a Sound Field Based on Microphones Around the Circumference of a Human Head","authors":"J. Ahrens, H. Helmholz, D. Alon, S. V. A. Garí","doi":"10.1109/WASPAA52581.2021.9632751","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632751","url":null,"abstract":"We present a method for decomposing a sound field into spherical harmonics (SH) based on observations of the sound field around the circumference of a human head. The method is based on the analytical solution for observations of the sound field along the equator of a rigid sphere that we presented recently. The present method incorporates a calibration stage in which the microphone signals for sound sources at a suitable set of calibration positions are projected onto the SH decomposition of the same sound field on the surface of a notional rigid sphere by means of a linear filtering operation. The filter coefficients are computed from the calibration data via a least-squares fit. We present an evaluation of the method based on binaural rendering of numerically simulated signals for an array of 18 microphones providing 8th SH order to demonstrate its effectiveness.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126842315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Wang, Nicholas J. Bryan, J. Salamon, M. Cartwright, J. Bello
{"title":"Who Calls The Shots? Rethinking Few-Shot Learning for Audio","authors":"Yu Wang, Nicholas J. Bryan, J. Salamon, M. Cartwright, J. Bello","doi":"10.1109/WASPAA52581.2021.9632677","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632677","url":null,"abstract":"Few-shot learning aims to train models that can recognize novel classes given just a handful of labeled examples, known as the support set. While the field has seen notable advances in recent years, they have often focused on multi-class image classification. Audio, in contrast, is often multi-label due to overlapping sounds, resulting in unique properties such as polyphony and signal-to-noise ratios (SNR). This leads to unanswered questions concerning the impact such audio properties may have on few-shot learning system design, performance, and human-computer interaction, as it is typically up to the user to collect and provide inference-time support set examples. We address these questions through a series of experiments designed to elucidate the answers to these questions. We introduce two novel datasets, FSD-MIX-CLIPS and FSD-MIX-SED, whose programmatic generation allows us to explore these questions systematically. Our experiments lead to audio-specific insights on few-shot learning, some of which are at odds with recent findings in the image domain: there is no best one-size- fits-all model, method, and support set selection criterion. Rather, it depends on the expected application scenario. Our code and data are available at https://github.com/wangyu/rethink-audio-fsl.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"637 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116411851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spherical Harmonic Diagonal Unloading Beamforming with Ego-Noise Reduction for DOA Estimation from Autonomous Systems","authors":"D. Salvati, C. Drioli, G. Foresti","doi":"10.1109/WASPAA52581.2021.9632772","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632772","url":null,"abstract":"A method to improve the localization of a sound source using a spherical microphone array embedded into autonomous systems is presented. The method is based on a low-complexity diagonal unloading (DU) beamforming in the spherical harmonic (SH) domain using a frequency smoothing power transform (FSPT) of the covariance matrices with a novel ego-noise reduction. The attenuation of the ego-noise in the signal-plus-ego-noise broadband FSTP covariance matrix is achieved by estimating the FSPT ego-noise covariance matrix and exploiting the subspace orthogonality property using a diagonal unloading procedure. Experiments with controlled real-world recordings performed by an aerial drone equipped with a 19-microphone spherical array while sensing a flying target drone demonstrate the efficiency of the proposed method.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131003098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhixing Liu, Yannan Wang, Gaoxiong Yi, Tao Yu, Fei Chen
{"title":"Assessing Segmental Impact for Objective Speech Quality Evaluation","authors":"Zhixing Liu, Yannan Wang, Gaoxiong Yi, Tao Yu, Fei Chen","doi":"10.1109/WASPAA52581.2021.9632785","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632785","url":null,"abstract":"Accurately predicting speech quality is important for the design of new speech coding and processing algorithms to improve speech communication. Existing speech quality metrics are computed with all speech segments, and do not consider the contributions of various speech segments for quality evaluation. The present work utilized a speech-level based segmentation method to separate a speech signal into high-, middle- and low-level regions, and computed the quality measures only with selected speech segments. Subjective speech quality rating data from 120 noise-masked/suppressed conditions (processed by 14 single-channel noise-suppression algorithms) were correlated with the objective speech quality indices. Results showed that compared with the conventional implementation with all speech segments, using middle-level speech segments to compute speech quality index could yield an improved correlation coefficient in predicting subjective quality ratings for most quality measures, particularly for the measure of output signal-to-noise ratio. The findings of the present work may provide a new scheme to improve the performance of objective speech quality assessment based on the segmental contributions of speech signals.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131467725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Deng, Chunhui Gao, Qian Feng, Xinzhou Xu, Zhaopeng Chen
{"title":"Adaptive Generalized Cross-Entropy Loss for Sound Event Classification with Noisy Labels","authors":"Jun Deng, Chunhui Gao, Qian Feng, Xinzhou Xu, Zhaopeng Chen","doi":"10.1109/WASPAA52581.2021.9632679","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632679","url":null,"abstract":"Considering the high cost of manually annotated large-scale datasets for superior sound event classifier performance, the data collection process has shifted to using the Internet, which facilitates easier user-contributed audio and metadata collection. However, label noise is inevitable. To address the problems caused by label noise, several types of noise-robust loss functions have been proposed recently as alternatives to the commonly categorical cross-entropy (CCE) loss, one of which is the generalized cross-entropy (GCE) loss, which demonstrates state-of-the-art performance. However, GCE cannot realize sufficient noise robustness and satisfactory accuracy simultaneously. Thus, we propose adaptive GCE loss, which automatically adapts to noisy labels in every batch to achieve adequate noise robustness and sufficient accuracy. We conducted experiments and found that the classification accuracy of the proposed loss demonstrated 4.7% and 1.2% absolute improvement over the CCE and GCE baselines, respectively. We also demonstrate that clean data consumption in the proposed loss is dramatically reduced by more than 75% compared with CCE.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123959685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Hold, Sebastian J. Schlecht, A. Politis, V. Pulkki
{"title":"Spatial Filter Bank in the Spherical Harmonic Domain: Reconstruction and Application","authors":"C. Hold, Sebastian J. Schlecht, A. Politis, V. Pulkki","doi":"10.1109/WASPAA52581.2021.9632709","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632709","url":null,"abstract":"Filter banks are an integral part of modern signal processing. They may also be applied to spatial filtering and the employed spatial filters can be designed with a specific shape for the analysis, e. g. suppressing side-lobes. After extracting spatially constrained signals from spherical harmonic (SH) input, i. e. filter bank analysis, many applications demand for a re-synthesis of the associated sector signals to the SH domain. This paper hence derives the complementary spatial filter bank reconstruction. The criterion for perfect reconstruction, and energy preserving reconstruction are given and implemented into the design. The filter bank is formulated such that for axisymmetric patterns both criteria can be met by only minor modification to the reconstruction stage. Its application is then demonstrated for both scenarios, perfect reconstruction and energy preservation of SH input signals.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129100973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}