L3DAS23: Learning 3D Audio Sources for Audio-Visual Extended Reality

IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Riccardo F. Gramaccioni;Christian Marinoni;Changan Chen;Aurelio Uncini;Danilo Comminiello
{"title":"L3DAS23: Learning 3D Audio Sources for Audio-Visual Extended Reality","authors":"Riccardo F. Gramaccioni;Christian Marinoni;Changan Chen;Aurelio Uncini;Danilo Comminiello","doi":"10.1109/OJSP.2024.3376297","DOIUrl":null,"url":null,"abstract":"The primary goal of the L3DAS (Learning 3D Audio Sources) project is to stimulate and support collaborative research studies concerning machine learning techniques applied to 3D audio signal processing. To this end, the L3DAS23 Challenge, presented at IEEE ICASSP 2023, focuses on two spatial audio tasks of paramount interest for practical uses: 3D speech enhancement (3DSE) and 3D sound event localization and detection (3DSELD). Both tasks are evaluated within augmented reality applications. The aim of this paper is to describe the main results obtained from this challenge. We provide the L3DAS23 dataset, which comprises a collection of first-order Ambisonics recordings in reverberant simulated environments. Indeed, we maintain some general characteristics of the previous L3DAS challenges, featuring a pair of first-order Ambisonics microphones to capture the audio signals and involving multiple-source and multiple-perspective Ambisonics recordings. However, in this new edition, we introduce audio-visual scenarios by including images that depict the frontal view of the environments as captured from the perspective of the microphones. This addition aims to enrich the challenge experience, giving participants tools for exploring a combination of audio and images for solving the 3DSE and 3DSELD tasks. In addition to a brand-new dataset, we provide updated baseline models designed to take advantage of audio-image pairs. To ensure accessibility and reproducibility, we also supply supporting API for an effortless replication of our results. Lastly, we present the results achieved by the participants of the L3DAS23 Challenge.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"632-640"},"PeriodicalIF":2.9000,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10468560","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of signal processing","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10468560/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

The primary goal of the L3DAS (Learning 3D Audio Sources) project is to stimulate and support collaborative research studies concerning machine learning techniques applied to 3D audio signal processing. To this end, the L3DAS23 Challenge, presented at IEEE ICASSP 2023, focuses on two spatial audio tasks of paramount interest for practical uses: 3D speech enhancement (3DSE) and 3D sound event localization and detection (3DSELD). Both tasks are evaluated within augmented reality applications. The aim of this paper is to describe the main results obtained from this challenge. We provide the L3DAS23 dataset, which comprises a collection of first-order Ambisonics recordings in reverberant simulated environments. Indeed, we maintain some general characteristics of the previous L3DAS challenges, featuring a pair of first-order Ambisonics microphones to capture the audio signals and involving multiple-source and multiple-perspective Ambisonics recordings. However, in this new edition, we introduce audio-visual scenarios by including images that depict the frontal view of the environments as captured from the perspective of the microphones. This addition aims to enrich the challenge experience, giving participants tools for exploring a combination of audio and images for solving the 3DSE and 3DSELD tasks. In addition to a brand-new dataset, we provide updated baseline models designed to take advantage of audio-image pairs. To ensure accessibility and reproducibility, we also supply supporting API for an effortless replication of our results. Lastly, we present the results achieved by the participants of the L3DAS23 Challenge.
L3DAS23:为视听扩展现实学习 3D 音频源
L3DAS(学习三维音频信号源)项目的主要目标是激励和支持将机器学习技术应用于三维音频信号处理的合作研究。为此,在 IEEE ICASSP 2023 大会上提出的 L3DAS23 挑战赛重点关注两个在实际应用中具有重大意义的空间音频任务:三维语音增强(3DSE)和三维声音事件定位和检测(3DSELD)。这两项任务都是在增强现实应用中进行评估的。本文旨在介绍从这一挑战中获得的主要结果。我们提供了 L3DAS23 数据集,该数据集由混响模拟环境中的一阶 Ambisonics 录音组成。事实上,我们保留了以往 L3DAS 挑战赛的一些一般特点,即使用一对一阶 Ambisonics 麦克风捕捉音频信号,并涉及多源和多视角 Ambisonics 录音。不过,在新版本中,我们引入了视听场景,加入了从麦克风角度捕捉环境正面视图的图像。这一新增内容旨在丰富挑战体验,为参赛者提供探索音频和图像相结合的工具,以解决 3DSE 和 3DSELD 任务。除了全新的数据集之外,我们还提供了更新的基线模型,旨在利用音频和图像对的优势。为了确保可访问性和可复制性,我们还提供了支持应用程序接口,以便轻松复制我们的成果。最后,我们介绍了 L3DAS23 挑战赛参赛者取得的成果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.30
自引率
0.00%
发文量
0
审稿时长
22 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信