Automated Audio Data Augmentation Network Using Bi-Level Optimization for Sound Event Localization and Detection

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters Pub Date : 2024-10-07 DOI:10.1109/LSP.2024.3475350

Wenjie Zhang;Peng Yu;Jun Yin;Xiaoheng Jiang;Mingliang Xu

{"title":"Automated Audio Data Augmentation Network Using Bi-Level Optimization for Sound Event Localization and Detection","authors":"Wenjie Zhang;Peng Yu;Jun Yin;Xiaoheng Jiang;Mingliang Xu","doi":"10.1109/LSP.2024.3475350","DOIUrl":null,"url":null,"abstract":"In sound event localization and detection (SELD), traditional methods often treat localization and detection algorithms separately from data augmentation. During the model training process, the strategy for data augmentation is typically implemented in a non-learnable manner. Existing audio data augmentation strategies struggle to find optimal parameter solutions for data augmentation that can be effectively applied to SELD systems. To address this challenge, we introduce an innovative network-based strategy, termed the Automated Audio Data Augmentation (AADA) network. This strategy employs bi-level optimization to synergistically integrate audio data augmentation techniques with SELD tasks. In the AADA network, the lower-level SELD task serves as a constraint for the higher-level data augmentation process. The audio data augmentation parameters are adaptively optimized by utilizing the transfer of intermediate feature information from the SELD tasks, thus obtaining optimal parameters for these tasks. Evaluation of our approach on the Sony-TAU Realistic Spatial Soundscapes 2023 dataset achieves a SELD score of 0.4801, significantly surpassing the performance metrics of all traditional data augmentation strategies for SELD.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"2770-2774"},"PeriodicalIF":3.2000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10706700/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

In sound event localization and detection (SELD), traditional methods often treat localization and detection algorithms separately from data augmentation. During the model training process, the strategy for data augmentation is typically implemented in a non-learnable manner. Existing audio data augmentation strategies struggle to find optimal parameter solutions for data augmentation that can be effectively applied to SELD systems. To address this challenge, we introduce an innovative network-based strategy, termed the Automated Audio Data Augmentation (AADA) network. This strategy employs bi-level optimization to synergistically integrate audio data augmentation techniques with SELD tasks. In the AADA network, the lower-level SELD task serves as a constraint for the higher-level data augmentation process. The audio data augmentation parameters are adaptively optimized by utilizing the transfer of intermediate feature information from the SELD tasks, thus obtaining optimal parameters for these tasks. Evaluation of our approach on the Sony-TAU Realistic Spatial Soundscapes 2023 dataset achieves a SELD score of 0.4801, significantly surpassing the performance metrics of all traditional data augmentation strategies for SELD.

查看原文本刊更多论文

利用双级优化实现声音事件定位和检测的自动音频数据增强网络

在声音事件定位和检测（SELD）中，传统方法通常将定位和检测算法与数据增强分开处理。在模型训练过程中，数据增强策略通常以不可学习的方式实施。现有的音频数据增强策略难以找到可有效应用于 SELD 系统的最佳数据增强参数解决方案。为了应对这一挑战，我们引入了一种基于网络的创新策略，称为自动音频数据增强（AADA）网络。该策略采用双层优化技术，将音频数据增强技术与 SELD 任务协同整合。在 AADA 网络中，低层次的 SELD 任务是高层次数据增强过程的约束条件。音频数据增强参数利用来自 SELD 任务的中间特征信息传输进行自适应优化，从而为这些任务获得最佳参数。在 Sony-TAU Realistic Spatial Soundscapes 2023 数据集上对我们的方法进行评估后，SELD 得分为 0.4801，大大超过了所有传统 SELD 数据增强策略的性能指标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Signal Processing Letters 工程技术-工程：电子与电气

CiteScore

7.40

自引率

12.80%

发文量

339

审稿时长

2.8 months

期刊介绍： The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.