2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)最新文献

Internal Time Delay Calibration of Rigid Spherical Microphone Arrays for Multi-Perspective 6DoF Audio Recordings 用于多视角6DoF音频录制的刚性球形麦克风阵列的内部延迟校准

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632715

Ege Erdem, Orhun Olgun, H. Hacıhabiboğlu

引用次数: 2

2D Multizone Sound Field Synthesis with Interior-Exterior Ambisonics 二维多区域声场合成与内部-外部立体声

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632736

T. Okamoto

{"title":"2D Multizone Sound Field Synthesis with Interior-Exterior Ambisonics","authors":"T. Okamoto","doi":"10.1109/WASPAA52581.2021.9632736","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632736","url":null,"abstract":"This paper presents a two-dimensional multizone sound field synthesis method based on sound field separation and interior-exterior higher-order Ambisonics (HOA) using two circular loudspeaker arrays. In the conventional methods using a circular loudspeaker array, multiple target local sound zones are represented as a global interior sound field. However, the control accuracy of the conventional methods is strongly dependent on the positional relationship between the wavefront direction in a bright zone and other dark zones. This is known as the occlusion problem. On the other hand, the global sound field is defined as a mixture of the interior and exterior sound fields based on sound field separation in the proposed method. The separated global interior and exterior sound fields are then simultaneously synthesized via interior-exterior HOA using two circular loudspeaker arrays with a cylindrical baffle to avoid the forbidden frequency problem in exterior HOA. The results of computer simulations demonstrate that the proposed method using the two circular arrays can successfully avoid the occlusion problem and realize higher control accuracy than the conventional methods using a single circular array.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129013333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Differentiable Tracking-Based Training of Deep Learning Sound Source Localizers 基于微微分跟踪的深度学习声源定位器训练

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632773

Sharath Adavanne, A. Politis, T. Virtanen

引用次数: 6

Polynomial Matrix Eigenvalue Decomposition-Based Source Separation Using Informed Spherical Microphone Arrays 基于多项式矩阵特征值分解的球形传声器阵列源分离

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632722

Vincent W. Neo, C. Evers, P. Naylor

引用次数: 3

Spherical Harmonic Decomposition of a Sound Field Based on Microphones Around the Circumference of a Human Head 基于人头部周围麦克风的声场球面谐波分解

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632751

J. Ahrens, H. Helmholz, D. Alon, S. V. A. Garí

引用次数: 8

Who Calls The Shots? Rethinking Few-Shot Learning for Audio 谁说了算?重新思考音频的短镜头学习

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632677

Yu Wang, Nicholas J. Bryan, J. Salamon, M. Cartwright, J. Bello

{"title":"Who Calls The Shots? Rethinking Few-Shot Learning for Audio","authors":"Yu Wang, Nicholas J. Bryan, J. Salamon, M. Cartwright, J. Bello","doi":"10.1109/WASPAA52581.2021.9632677","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632677","url":null,"abstract":"Few-shot learning aims to train models that can recognize novel classes given just a handful of labeled examples, known as the support set. While the field has seen notable advances in recent years, they have often focused on multi-class image classification. Audio, in contrast, is often multi-label due to overlapping sounds, resulting in unique properties such as polyphony and signal-to-noise ratios (SNR). This leads to unanswered questions concerning the impact such audio properties may have on few-shot learning system design, performance, and human-computer interaction, as it is typically up to the user to collect and provide inference-time support set examples. We address these questions through a series of experiments designed to elucidate the answers to these questions. We introduce two novel datasets, FSD-MIX-CLIPS and FSD-MIX-SED, whose programmatic generation allows us to explore these questions systematically. Our experiments lead to audio-specific insights on few-shot learning, some of which are at odds with recent findings in the image domain: there is no best one-size- fits-all model, method, and support set selection criterion. Rather, it depends on the expected application scenario. Our code and data are available at https://github.com/wangyu/rethink-audio-fsl.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"637 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116411851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Spherical Harmonic Diagonal Unloading Beamforming with Ego-Noise Reduction for DOA Estimation from Autonomous Systems 基于自噪声降噪的球面谐波对角卸载波束形成自治系统DOA估计

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632772

D. Salvati, C. Drioli, G. Foresti

引用次数: 1

Assessing Segmental Impact for Objective Speech Quality Evaluation 客观语音质量评价中的分词影响评估

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632785

Zhixing Liu, Yannan Wang, Gaoxiong Yi, Tao Yu, Fei Chen

{"title":"Assessing Segmental Impact for Objective Speech Quality Evaluation","authors":"Zhixing Liu, Yannan Wang, Gaoxiong Yi, Tao Yu, Fei Chen","doi":"10.1109/WASPAA52581.2021.9632785","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632785","url":null,"abstract":"Accurately predicting speech quality is important for the design of new speech coding and processing algorithms to improve speech communication. Existing speech quality metrics are computed with all speech segments, and do not consider the contributions of various speech segments for quality evaluation. The present work utilized a speech-level based segmentation method to separate a speech signal into high-, middle- and low-level regions, and computed the quality measures only with selected speech segments. Subjective speech quality rating data from 120 noise-masked/suppressed conditions (processed by 14 single-channel noise-suppression algorithms) were correlated with the objective speech quality indices. Results showed that compared with the conventional implementation with all speech segments, using middle-level speech segments to compute speech quality index could yield an improved correlation coefficient in predicting subjective quality ratings for most quality measures, particularly for the measure of output signal-to-noise ratio. The findings of the present work may provide a new scheme to improve the performance of objective speech quality assessment based on the segmental contributions of speech signals.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131467725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive Generalized Cross-Entropy Loss for Sound Event Classification with Noisy Labels 基于自适应广义交叉熵损失的带噪声标记声事件分类

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632679

Jun Deng, Chunhui Gao, Qian Feng, Xinzhou Xu, Zhaopeng Chen

{"title":"Adaptive Generalized Cross-Entropy Loss for Sound Event Classification with Noisy Labels","authors":"Jun Deng, Chunhui Gao, Qian Feng, Xinzhou Xu, Zhaopeng Chen","doi":"10.1109/WASPAA52581.2021.9632679","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632679","url":null,"abstract":"Considering the high cost of manually annotated large-scale datasets for superior sound event classifier performance, the data collection process has shifted to using the Internet, which facilitates easier user-contributed audio and metadata collection. However, label noise is inevitable. To address the problems caused by label noise, several types of noise-robust loss functions have been proposed recently as alternatives to the commonly categorical cross-entropy (CCE) loss, one of which is the generalized cross-entropy (GCE) loss, which demonstrates state-of-the-art performance. However, GCE cannot realize sufficient noise robustness and satisfactory accuracy simultaneously. Thus, we propose adaptive GCE loss, which automatically adapts to noisy labels in every batch to achieve adequate noise robustness and sufficient accuracy. We conducted experiments and found that the classification accuracy of the proposed loss demonstrated 4.7% and 1.2% absolute improvement over the CCE and GCE baselines, respectively. We also demonstrate that clean data consumption in the proposed loss is dramatically reduced by more than 75% compared with CCE.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123959685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Spatial Filter Bank in the Spherical Harmonic Domain: Reconstruction and Application 球谐域空间滤波器组:重构与应用

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-10-17 DOI: 10.1109/WASPAA52581.2021.9632709

C. Hold, Sebastian J. Schlecht, A. Politis, V. Pulkki

引用次数: 5