Binaural Signal Representations for Joint Sound Event Detection and Acoustic Scene Classification

D. Krause, A. Mesaros
{"title":"Binaural Signal Representations for Joint Sound Event Detection and Acoustic Scene Classification","authors":"D. Krause, A. Mesaros","doi":"10.48550/arXiv.2209.05900","DOIUrl":null,"url":null,"abstract":"Sound event detection (SED) and Acoustic scene classification (ASC) are two widely researched audio tasks that constitute an important part of research on acoustic scene analysis. Considering shared information between sound events and acoustic scenes, performing both tasks jointly is a natural part of a complex machine listening system. In this paper, we investigate the usefulness of several spatial audio features in training a joint deep neural network (DNN) model performing SED and ASC. Experiments are performed for two different datasets containing binaural recordings and synchronous sound event and acoustic scene labels to analyse the differences between performing SED and ASC separately or jointly. The presented results show that the use of specific binaural features, mainly the Generalized Cross Correlation with Phase Transform (GCC-phat) and sines and cosines of phase differences, result in a better performing model in both separate and joint tasks as compared with baseline methods based on logmel energies only.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2209.05900","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Sound event detection (SED) and Acoustic scene classification (ASC) are two widely researched audio tasks that constitute an important part of research on acoustic scene analysis. Considering shared information between sound events and acoustic scenes, performing both tasks jointly is a natural part of a complex machine listening system. In this paper, we investigate the usefulness of several spatial audio features in training a joint deep neural network (DNN) model performing SED and ASC. Experiments are performed for two different datasets containing binaural recordings and synchronous sound event and acoustic scene labels to analyse the differences between performing SED and ASC separately or jointly. The presented results show that the use of specific binaural features, mainly the Generalized Cross Correlation with Phase Transform (GCC-phat) and sines and cosines of phase differences, result in a better performing model in both separate and joint tasks as compared with baseline methods based on logmel energies only.
联合声事件检测和声场景分类的双耳信号表示
声事件检测(SED)和声场景分类(ASC)是两项被广泛研究的音频任务,是声场景分析研究的重要组成部分。考虑到声音事件和声音场景之间的共享信息,联合执行这两项任务是复杂机器聆听系统的自然组成部分。在本文中,我们研究了几种空间音频特征在训练执行SED和ASC的联合深度神经网络(DNN)模型中的有用性。实验采用两种不同的数据集,包括双耳录音和同步声音事件和声学场景标签,以分析单独或联合执行SED和ASC的差异。结果表明,与仅基于logmel能量的基线方法相比,使用特定的双耳特征,主要是相位变换的广义互相关(GCC-phat)和相位差的正弦和余弦,可以在单独和联合任务中获得更好的模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信