Binaural Signal Representations for Joint Sound Event Detection and Acoustic Scene Classification

2022 30th European Signal Processing Conference (EUSIPCO) Pub Date : 2022-08-29 DOI:10.48550/arXiv.2209.05900

D. Krause, A. Mesaros

引用次数: 1

Abstract

Sound event detection (SED) and Acoustic scene classification (ASC) are two widely researched audio tasks that constitute an important part of research on acoustic scene analysis. Considering shared information between sound events and acoustic scenes, performing both tasks jointly is a natural part of a complex machine listening system. In this paper, we investigate the usefulness of several spatial audio features in training a joint deep neural network (DNN) model performing SED and ASC. Experiments are performed for two different datasets containing binaural recordings and synchronous sound event and acoustic scene labels to analyse the differences between performing SED and ASC separately or jointly. The presented results show that the use of specific binaural features, mainly the Generalized Cross Correlation with Phase Transform (GCC-phat) and sines and cosines of phase differences, result in a better performing model in both separate and joint tasks as compared with baseline methods based on logmel energies only.

查看原文本刊更多论文

联合声事件检测和声场景分类的双耳信号表示

声事件检测(SED)和声场景分类(ASC)是两项被广泛研究的音频任务，是声场景分析研究的重要组成部分。考虑到声音事件和声音场景之间的共享信息，联合执行这两项任务是复杂机器聆听系统的自然组成部分。在本文中，我们研究了几种空间音频特征在训练执行SED和ASC的联合深度神经网络(DNN)模型中的有用性。实验采用两种不同的数据集，包括双耳录音和同步声音事件和声学场景标签，以分析单独或联合执行SED和ASC的差异。结果表明，与仅基于logmel能量的基线方法相比，使用特定的双耳特征，主要是相位变换的广义互相关(GCC-phat)和相位差的正弦和余弦，可以在单独和联合任务中获得更好的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 30th European Signal Processing Conference (EUSIPCO)

自引率

0.00%

发文量