用于声音事件检测的超网络:概念验证

2022 30th European Signal Processing Conference (EUSIPCO) Pub Date : 2022-08-29 DOI:10.23919/eusipco55093.2022.9909716

Shubhr Singh, Huy Phan, Emmanouil Benetos

{"title":"用于声音事件检测的超网络:概念验证","authors":"Shubhr Singh, Huy Phan, Emmanouil Benetos","doi":"10.23919/eusipco55093.2022.9909716","DOIUrl":null,"url":null,"abstract":"Polyphonic sound event detection (SED) involves the pre-diction of sound events present in an audio recording along with their onset and offset times. Recently, Deep Neural Net-works, specifically convolutional recurrent neural networks (CRNN) have achieved impressive results for this task. The convolution part of the architecture is used to extract trans-lational invariant features from the input and the recurrent part learns the underlying temporal relationship between au-dio frames. Recent studies showed that the weight sharing paradigm of recurrent networks might be a hindering factor in certain kinds of time series data, specifically where there is a temporal conditional shift, i.e. the conditional distribution of a label changes across the temporal scale. This warrants a relevant question - is there a similar phenomenon in poly-phonic sound events due to dynamic polyphony level across the temporal axis? In this work, we explore this question and inquire if relaxed weight sharing improves performance of a CRNN for polyphonic SED. We propose to use hyper-networks to relax weight sharing in the recurrent part and show that the CRNN's performance is improved by ≈ 3% across two datasets, thus paving the way for further explo-ration of the existence of temporal conditional shift for poly-phonic SED.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hypernetworks for Sound event Detection: a Proof-of-Concept\",\"authors\":\"Shubhr Singh, Huy Phan, Emmanouil Benetos\",\"doi\":\"10.23919/eusipco55093.2022.9909716\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Polyphonic sound event detection (SED) involves the pre-diction of sound events present in an audio recording along with their onset and offset times. Recently, Deep Neural Net-works, specifically convolutional recurrent neural networks (CRNN) have achieved impressive results for this task. The convolution part of the architecture is used to extract trans-lational invariant features from the input and the recurrent part learns the underlying temporal relationship between au-dio frames. Recent studies showed that the weight sharing paradigm of recurrent networks might be a hindering factor in certain kinds of time series data, specifically where there is a temporal conditional shift, i.e. the conditional distribution of a label changes across the temporal scale. This warrants a relevant question - is there a similar phenomenon in poly-phonic sound events due to dynamic polyphony level across the temporal axis? In this work, we explore this question and inquire if relaxed weight sharing improves performance of a CRNN for polyphonic SED. We propose to use hyper-networks to relax weight sharing in the recurrent part and show that the CRNN's performance is improved by ≈ 3% across two datasets, thus paving the way for further explo-ration of the existence of temporal conditional shift for poly-phonic SED.\",\"PeriodicalId\":231263,\"journal\":{\"name\":\"2022 30th European Signal Processing Conference (EUSIPCO)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 30th European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/eusipco55093.2022.9909716\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/eusipco55093.2022.9909716","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

复调声音事件检测(SED)涉及到对音频记录中出现的声音事件及其开始和偏移时间的预测。最近，深度神经网络，特别是卷积递归神经网络(CRNN)在这项任务上取得了令人印象深刻的成果。该体系结构的卷积部分用于从输入中提取平移不变特征，循环部分用于学习au- audio帧之间的潜在时间关系。最近的研究表明，在某些类型的时间序列数据中，循环网络的权重共享范式可能是一个阻碍因素，特别是在存在时间条件转移的情况下，即标签的条件分布在时间尺度上发生了变化。这就提出了一个相关的问题——在复音事件中，由于动态复音水平跨越时间轴，是否也存在类似的现象?在这项工作中，我们探讨了这个问题，并询问放宽权值共享是否可以提高重音SED的CRNN性能。我们建议使用超网络来放松循环部分的权值共享，并表明CRNN的性能在两个数据集上提高了约3%，从而为进一步探索多音SED的时间条件移位的存在铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hypernetworks for Sound event Detection: a Proof-of-Concept

Polyphonic sound event detection (SED) involves the pre-diction of sound events present in an audio recording along with their onset and offset times. Recently, Deep Neural Net-works, specifically convolutional recurrent neural networks (CRNN) have achieved impressive results for this task. The convolution part of the architecture is used to extract trans-lational invariant features from the input and the recurrent part learns the underlying temporal relationship between au-dio frames. Recent studies showed that the weight sharing paradigm of recurrent networks might be a hindering factor in certain kinds of time series data, specifically where there is a temporal conditional shift, i.e. the conditional distribution of a label changes across the temporal scale. This warrants a relevant question - is there a similar phenomenon in poly-phonic sound events due to dynamic polyphony level across the temporal axis? In this work, we explore this question and inquire if relaxed weight sharing improves performance of a CRNN for polyphonic SED. We propose to use hyper-networks to relax weight sharing in the recurrent part and show that the CRNN's performance is improved by ≈ 3% across two datasets, thus paving the way for further explo-ration of the existence of temporal conditional shift for poly-phonic SED.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 30th European Signal Processing Conference (EUSIPCO)

自引率

0.00%

发文量