2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)最新文献_第8页

Weakly Supervised Source-Specific Sound Level Estimation in Noisy Soundscapes 噪声环境中弱监督特定声源声级估计

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-05-06 DOI: 10.1109/WASPAA52581.2021.9632767

A. Cramer, M. Cartwright, Fatemeh Pishdadian, J. Bello

引用次数: 2

Self-Supervised Learning from Automatically Separated Sound Scenes 自动分离声音场景的自监督学习

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-05-05 DOI: 10.1109/WASPAA52581.2021.9632739

Eduardo Fonseca, A. Jansen, D. Ellis, Scott Wisdom, M. Tagliasacchi, J. Hershey, Manoj Plakal, Shawn Hershey, R. C. Moore, Xavier Serra

{"title":"Self-Supervised Learning from Automatically Separated Sound Scenes","authors":"Eduardo Fonseca, A. Jansen, D. Ellis, Scott Wisdom, M. Tagliasacchi, J. Hershey, Manoj Plakal, Shawn Hershey, R. C. Moore, Xavier Serra","doi":"10.1109/WASPAA52581.2021.9632739","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632739","url":null,"abstract":"Real-world sound scenes consist of time-varying collections of sound sources, each generating characteristic sound events that are mixed together in audio recordings. The association of these constituent sound events with their mixture and each other is semantically constrained: the sound scene contains the union of source classes and not all classes naturally co-occur. With this motivation, this paper explores the use of unsupervised automatic sound separation to decompose unlabeled sound scenes into multiple semantically-linked views for use in self-supervised contrastive learning. We find that learning to associate input mixtures with their automatically separated outputs yields stronger representations than past approaches that use the mixtures alone. Further, we discover that optimal source separation is not required for successful contrastive learning by demonstrating that a range of separation system convergence states all lead to useful and often complementary example transformations. Our best system incorporates these unsupervised separation models into a single augmentation front-end and jointly optimizes similarity maximization and coincidence prediction objectives across the views. The result is an unsupervised audio representation that rivals state-of-the-art alternatives on the established shallow AudioSet classification benchmark.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125450943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

MIMII Due: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection with Domain Shifts Due to Changes in Operational and Environmental Conditions MIMII Due:用于因操作和环境条件变化而发生故障的工业机器调查和检查的健全数据集

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-05-05 DOI: 10.5281/ZENODO.4740355

Ryo Tanabe, Harsh Purohit, Kota Dohi, Takashi Endo, Yuki Nikaido, Toshiki Nakamura, Y. Kawaguchi

{"title":"MIMII Due: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection with Domain Shifts Due to Changes in Operational and Environmental Conditions","authors":"Ryo Tanabe, Harsh Purohit, Kota Dohi, Takashi Endo, Yuki Nikaido, Toshiki Nakamura, Y. Kawaguchi","doi":"10.5281/ZENODO.4740355","DOIUrl":"https://doi.org/10.5281/ZENODO.4740355","url":null,"abstract":"In this paper, we introduce MIMII DUE, a new dataset for malfunctioning industrial machine investigation and inspection with domain shifts due to changes in operational and environmental conditions. Conventional methods for anomalous sound detection face practical challenges because the distribution of features changes between the training and operational phases (called domain shift) due to various real-world factors. To check the robustness against domain shifts, we need a dataset that actually includes domain shifts, but such a dataset does not exist so far. The new dataset we created consists of the normal and abnormal operating sounds of five different types of industrial machines under two different operational/environmental conditions (source domain and target domain) independent of normal/abnormal, with domain shifts occurring between the two domains. Experimental results showed significant performance differences between the source and target domains, indicating that the dataset contains the domain shifts. These findings demonstrate that the dataset will be helpful for checking the robustness against domain shifts.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115589563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 37

VQCPC-GAN: Variable-Length Adversarial Audio Synthesis Using Vector-Quantized Contrastive Predictive Coding 基于矢量量化对比预测编码的变长对抗性音频合成

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-05-04 DOI: 10.1109/WASPAA52581.2021.9632757

J. Nistal, Cyran Aouameur, S. Lattner, G. Richard

{"title":"VQCPC-GAN: Variable-Length Adversarial Audio Synthesis Using Vector-Quantized Contrastive Predictive Coding","authors":"J. Nistal, Cyran Aouameur, S. Lattner, G. Richard","doi":"10.1109/WASPAA52581.2021.9632757","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632757","url":null,"abstract":"Influenced by the field of Computer Vision, Generative Adversarial Networks (GANs) are often adopted for the audio domain using fixed-size two-dimensional spectrogram representations as the “image data”. However, in the (musical) audio domain, it is often desired to generate output of variable duration. This paper presents VQCPC-GAN, an adversarial framework for synthesizing variable-length audio by exploiting Vector-Quantized Contrastive Predictive Coding (VQCPC). A sequence of VQCPC tokens extracted from real audio data serves as conditional input to a GAN architecture, providing step-wise time-dependent features of the generated content. The input noise $z$ (characteristic in adversarial architectures) remains fixed over time, ensuring temporal consistency of global features. We evaluate the proposed model by comparing a diverse set of metrics against various strong baselines. Results show that, even though the baselines score best, VQCPC-GAN achieves comparable performance even when generating variable-length audio. Numerous sound examples are provided in the accompanying website,11sonycslparis.github.io/vqcpc-gan.io and we release the code for reproducibility.22github.com/SonyCSLParis/vqcpc-gan","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"10 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132809474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Identifying Actions for Sound Event Classification 识别声音事件分类的动作

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-04-26 DOI: 10.1109/WASPAA52581.2021.9632782

Benjamin Elizalde, Radu Revutchi, Samarjit Das, B. Raj, Ian Lane, L. Heller

{"title":"Identifying Actions for Sound Event Classification","authors":"Benjamin Elizalde, Radu Revutchi, Samarjit Das, B. Raj, Ian Lane, L. Heller","doi":"10.1109/WASPAA52581.2021.9632782","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632782","url":null,"abstract":"In Psychology, actions are paramount for humans to identify sound events. In Machine Learning (ML), action recognition achieves high accuracy; however, it has not been asked whether identifying actions can benefit Sound Event Classification (SEC), as opposed to mapping the audio directly to a sound event. Therefore, we propose a new Psychology-inspired approach for SEC that includes identification of actions via human listeners. To achieve this goal, we used crowdsourcing to have listeners identify 20 actions that in isolation or in combination may have produced any of the 50 sound events in the well-studied dataset ESC-50. The resulting annotations for each audio recording relate actions to a database of sound events for the first time. The annotations were used to create semantic representations called Action Vectors (AVs). We evaluated SEC by comparing the AVs with two types of audio features - log-mel spectrograms and state-of-the-art audio embeddings. Because audio features and AVs capture different abstractions of the acoustic content, we combined them and achieved one of the highest reported accuracies (88%).","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114405038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Low-Complexity Steered Response Power Mapping Based on Nyquist-Shannon Sampling 基于Nyquist-Shannon采样的低复杂度转向响应功率映射

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2020-12-17 DOI: 10.1109/WASPAA52581.2021.9632774

Thomas Dietzen, E. D. Sena, T. Waterschoot

{"title":"Low-Complexity Steered Response Power Mapping Based on Nyquist-Shannon Sampling","authors":"Thomas Dietzen, E. D. Sena, T. Waterschoot","doi":"10.1109/WASPAA52581.2021.9632774","DOIUrl":"https://doi.org/10.1109/WASPAA52581.2021.9632774","url":null,"abstract":"The steered response power (SRP) approach to acoustic source localization computes a map of the acoustic scene from the frequency-weighted output power of a beamformer steered towards a set of candidate locations. Equivalently, SRP may be expressed in terms of time-domain generalized cross-correlations (GCCs) at lags equal to the candidate locations' time-differences of arrival (TDOAs). Due to the dense grid of candidate locations, each of which requires inverse Fourier transform (IFT) evaluations, conventional SRP exhibits a high computational complexity. In this paper, we propose a low-complexity SRP approach based on Nyquist-Shannon sampling. Noting that on the one hand the range of possible TDOAs is physically bounded, while on the other hand the GCCs are bandlimited, we critically sample the GCCs around their TDOA interval and approximate the SRP map by interpolation. In usual setups, the number of sample points can be orders of magnitude less than the number of candidate locations and frequency bins, yielding a significant reduction of IFT computations at a limited interpolation cost. Simulations comparing the proposed approximation with conventional SRP indicate low approximation errors and equal localization performance. MATLAB and Python implementations are available online.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124785846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Fasteriva: Update Rules for Independent Vector Analysis Based on Negentropy and the Majorize-Minimize Principle Fasteriva:基于负熵和最大-最小原则的独立向量分析更新规则

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2020-03-20 DOI: 10.1109/WASPAA52581.2021.9632790

Andreas Brendel, Walter Kellermann

引用次数: 3