利用双级优化实现声音事件定位和检测的自动音频数据增强网络

IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Wenjie Zhang;Peng Yu;Jun Yin;Xiaoheng Jiang;Mingliang Xu
{"title":"利用双级优化实现声音事件定位和检测的自动音频数据增强网络","authors":"Wenjie Zhang;Peng Yu;Jun Yin;Xiaoheng Jiang;Mingliang Xu","doi":"10.1109/LSP.2024.3475350","DOIUrl":null,"url":null,"abstract":"In sound event localization and detection (SELD), traditional methods often treat localization and detection algorithms separately from data augmentation. During the model training process, the strategy for data augmentation is typically implemented in a non-learnable manner. Existing audio data augmentation strategies struggle to find optimal parameter solutions for data augmentation that can be effectively applied to SELD systems. To address this challenge, we introduce an innovative network-based strategy, termed the Automated Audio Data Augmentation (AADA) network. This strategy employs bi-level optimization to synergistically integrate audio data augmentation techniques with SELD tasks. In the AADA network, the lower-level SELD task serves as a constraint for the higher-level data augmentation process. The audio data augmentation parameters are adaptively optimized by utilizing the transfer of intermediate feature information from the SELD tasks, thus obtaining optimal parameters for these tasks. Evaluation of our approach on the Sony-TAU Realistic Spatial Soundscapes 2023 dataset achieves a SELD score of 0.4801, significantly surpassing the performance metrics of all traditional data augmentation strategies for SELD.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"2770-2774"},"PeriodicalIF":3.2000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automated Audio Data Augmentation Network Using Bi-Level Optimization for Sound Event Localization and Detection\",\"authors\":\"Wenjie Zhang;Peng Yu;Jun Yin;Xiaoheng Jiang;Mingliang Xu\",\"doi\":\"10.1109/LSP.2024.3475350\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In sound event localization and detection (SELD), traditional methods often treat localization and detection algorithms separately from data augmentation. During the model training process, the strategy for data augmentation is typically implemented in a non-learnable manner. Existing audio data augmentation strategies struggle to find optimal parameter solutions for data augmentation that can be effectively applied to SELD systems. To address this challenge, we introduce an innovative network-based strategy, termed the Automated Audio Data Augmentation (AADA) network. This strategy employs bi-level optimization to synergistically integrate audio data augmentation techniques with SELD tasks. In the AADA network, the lower-level SELD task serves as a constraint for the higher-level data augmentation process. The audio data augmentation parameters are adaptively optimized by utilizing the transfer of intermediate feature information from the SELD tasks, thus obtaining optimal parameters for these tasks. Evaluation of our approach on the Sony-TAU Realistic Spatial Soundscapes 2023 dataset achieves a SELD score of 0.4801, significantly surpassing the performance metrics of all traditional data augmentation strategies for SELD.\",\"PeriodicalId\":13154,\"journal\":{\"name\":\"IEEE Signal Processing Letters\",\"volume\":\"31 \",\"pages\":\"2770-2774\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2024-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Signal Processing Letters\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10706700/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10706700/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

在声音事件定位和检测(SELD)中,传统方法通常将定位和检测算法与数据增强分开处理。在模型训练过程中,数据增强策略通常以不可学习的方式实施。现有的音频数据增强策略难以找到可有效应用于 SELD 系统的最佳数据增强参数解决方案。为了应对这一挑战,我们引入了一种基于网络的创新策略,称为自动音频数据增强(AADA)网络。该策略采用双层优化技术,将音频数据增强技术与 SELD 任务协同整合。在 AADA 网络中,低层次的 SELD 任务是高层次数据增强过程的约束条件。音频数据增强参数利用来自 SELD 任务的中间特征信息传输进行自适应优化,从而为这些任务获得最佳参数。在 Sony-TAU Realistic Spatial Soundscapes 2023 数据集上对我们的方法进行评估后,SELD 得分为 0.4801,大大超过了所有传统 SELD 数据增强策略的性能指标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Automated Audio Data Augmentation Network Using Bi-Level Optimization for Sound Event Localization and Detection
In sound event localization and detection (SELD), traditional methods often treat localization and detection algorithms separately from data augmentation. During the model training process, the strategy for data augmentation is typically implemented in a non-learnable manner. Existing audio data augmentation strategies struggle to find optimal parameter solutions for data augmentation that can be effectively applied to SELD systems. To address this challenge, we introduce an innovative network-based strategy, termed the Automated Audio Data Augmentation (AADA) network. This strategy employs bi-level optimization to synergistically integrate audio data augmentation techniques with SELD tasks. In the AADA network, the lower-level SELD task serves as a constraint for the higher-level data augmentation process. The audio data augmentation parameters are adaptively optimized by utilizing the transfer of intermediate feature information from the SELD tasks, thus obtaining optimal parameters for these tasks. Evaluation of our approach on the Sony-TAU Realistic Spatial Soundscapes 2023 dataset achieves a SELD score of 0.4801, significantly surpassing the performance metrics of all traditional data augmentation strategies for SELD.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Signal Processing Letters
IEEE Signal Processing Letters 工程技术-工程:电子与电气
CiteScore
7.40
自引率
12.80%
发文量
339
审稿时长
2.8 months
期刊介绍: The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信