用于声音事件检测的超网络:概念验证

Shubhr Singh, Huy Phan, Emmanouil Benetos
{"title":"用于声音事件检测的超网络:概念验证","authors":"Shubhr Singh, Huy Phan, Emmanouil Benetos","doi":"10.23919/eusipco55093.2022.9909716","DOIUrl":null,"url":null,"abstract":"Polyphonic sound event detection (SED) involves the pre-diction of sound events present in an audio recording along with their onset and offset times. Recently, Deep Neural Net-works, specifically convolutional recurrent neural networks (CRNN) have achieved impressive results for this task. The convolution part of the architecture is used to extract trans-lational invariant features from the input and the recurrent part learns the underlying temporal relationship between au-dio frames. Recent studies showed that the weight sharing paradigm of recurrent networks might be a hindering factor in certain kinds of time series data, specifically where there is a temporal conditional shift, i.e. the conditional distribution of a label changes across the temporal scale. This warrants a relevant question - is there a similar phenomenon in poly-phonic sound events due to dynamic polyphony level across the temporal axis? In this work, we explore this question and inquire if relaxed weight sharing improves performance of a CRNN for polyphonic SED. We propose to use hyper-networks to relax weight sharing in the recurrent part and show that the CRNN's performance is improved by ≈ 3% across two datasets, thus paving the way for further explo-ration of the existence of temporal conditional shift for poly-phonic SED.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hypernetworks for Sound event Detection: a Proof-of-Concept\",\"authors\":\"Shubhr Singh, Huy Phan, Emmanouil Benetos\",\"doi\":\"10.23919/eusipco55093.2022.9909716\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Polyphonic sound event detection (SED) involves the pre-diction of sound events present in an audio recording along with their onset and offset times. Recently, Deep Neural Net-works, specifically convolutional recurrent neural networks (CRNN) have achieved impressive results for this task. The convolution part of the architecture is used to extract trans-lational invariant features from the input and the recurrent part learns the underlying temporal relationship between au-dio frames. Recent studies showed that the weight sharing paradigm of recurrent networks might be a hindering factor in certain kinds of time series data, specifically where there is a temporal conditional shift, i.e. the conditional distribution of a label changes across the temporal scale. This warrants a relevant question - is there a similar phenomenon in poly-phonic sound events due to dynamic polyphony level across the temporal axis? In this work, we explore this question and inquire if relaxed weight sharing improves performance of a CRNN for polyphonic SED. We propose to use hyper-networks to relax weight sharing in the recurrent part and show that the CRNN's performance is improved by ≈ 3% across two datasets, thus paving the way for further explo-ration of the existence of temporal conditional shift for poly-phonic SED.\",\"PeriodicalId\":231263,\"journal\":{\"name\":\"2022 30th European Signal Processing Conference (EUSIPCO)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 30th European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/eusipco55093.2022.9909716\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/eusipco55093.2022.9909716","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

复调声音事件检测(SED)涉及到对音频记录中出现的声音事件及其开始和偏移时间的预测。最近,深度神经网络,特别是卷积递归神经网络(CRNN)在这项任务上取得了令人印象深刻的成果。该体系结构的卷积部分用于从输入中提取平移不变特征,循环部分用于学习au- audio帧之间的潜在时间关系。最近的研究表明,在某些类型的时间序列数据中,循环网络的权重共享范式可能是一个阻碍因素,特别是在存在时间条件转移的情况下,即标签的条件分布在时间尺度上发生了变化。这就提出了一个相关的问题——在复音事件中,由于动态复音水平跨越时间轴,是否也存在类似的现象?在这项工作中,我们探讨了这个问题,并询问放宽权值共享是否可以提高重音SED的CRNN性能。我们建议使用超网络来放松循环部分的权值共享,并表明CRNN的性能在两个数据集上提高了约3%,从而为进一步探索多音SED的时间条件移位的存在铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Hypernetworks for Sound event Detection: a Proof-of-Concept
Polyphonic sound event detection (SED) involves the pre-diction of sound events present in an audio recording along with their onset and offset times. Recently, Deep Neural Net-works, specifically convolutional recurrent neural networks (CRNN) have achieved impressive results for this task. The convolution part of the architecture is used to extract trans-lational invariant features from the input and the recurrent part learns the underlying temporal relationship between au-dio frames. Recent studies showed that the weight sharing paradigm of recurrent networks might be a hindering factor in certain kinds of time series data, specifically where there is a temporal conditional shift, i.e. the conditional distribution of a label changes across the temporal scale. This warrants a relevant question - is there a similar phenomenon in poly-phonic sound events due to dynamic polyphony level across the temporal axis? In this work, we explore this question and inquire if relaxed weight sharing improves performance of a CRNN for polyphonic SED. We propose to use hyper-networks to relax weight sharing in the recurrent part and show that the CRNN's performance is improved by ≈ 3% across two datasets, thus paving the way for further explo-ration of the existence of temporal conditional shift for poly-phonic SED.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信