Consistency Regularization-Based Polyphonic Audio Event Detection with Minimal Supervision

2022 IEEE 21st international Ccnference on Sciences and Techniques of Automatic Control and Computer Engineering (STA) Pub Date : 2022-12-19 DOI:10.1109/STA56120.2022.10019247

Zhor Diffallah, H. Ykhlef, Hafida Bouarfa, Nardjesse Diffallah

{"title":"Consistency Regularization-Based Polyphonic Audio Event Detection with Minimal Supervision","authors":"Zhor Diffallah, H. Ykhlef, Hafida Bouarfa, Nardjesse Diffallah","doi":"10.1109/STA56120.2022.10019247","DOIUrl":null,"url":null,"abstract":"Audio event detection refers to the task of specifying the nature of events happening in an audio stream, as well as locating these occurrences in time. Due to its wide applicability in a myriad of domains, this task has been gradually attracting interest over time. The development of the audio event detection task is largely dominated by modern deep learning techniques. Deep network architectures need a substantial amount of labeled audio clips that contain the start and end time of each event. However, collecting and annotating exhaustive datasets of audio recordings with the necessary information is both a costly and a laborious endeavour. To mend this, weakly-labeled semi-supervised learning methods have been adopted in an attempt to mitigate the labeling issue. In this work, we investigate the impact of incorporating weak labels and unlabeled clips into the training chain of audio event detectors. We have conducted our experiments on the Domestic Environment Sound Event Detection corpus (DESED); a large-scale heterogeneous dataset composed of several types of recordings and annotations. we have focused our study on methods based on consistency regularization; specifically: Mean Teacher and Interpolation Consistency Training. Our experimental results reveal that; with the proper parameterization, incorporating weakly-labeled and unlabeled data is beneficial for detecting polyphonic sound events.","PeriodicalId":430966,"journal":{"name":"2022 IEEE 21st international Ccnference on Sciences and Techniques of Automatic Control and Computer Engineering (STA)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 21st international Ccnference on Sciences and Techniques of Automatic Control and Computer Engineering (STA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/STA56120.2022.10019247","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Audio event detection refers to the task of specifying the nature of events happening in an audio stream, as well as locating these occurrences in time. Due to its wide applicability in a myriad of domains, this task has been gradually attracting interest over time. The development of the audio event detection task is largely dominated by modern deep learning techniques. Deep network architectures need a substantial amount of labeled audio clips that contain the start and end time of each event. However, collecting and annotating exhaustive datasets of audio recordings with the necessary information is both a costly and a laborious endeavour. To mend this, weakly-labeled semi-supervised learning methods have been adopted in an attempt to mitigate the labeling issue. In this work, we investigate the impact of incorporating weak labels and unlabeled clips into the training chain of audio event detectors. We have conducted our experiments on the Domestic Environment Sound Event Detection corpus (DESED); a large-scale heterogeneous dataset composed of several types of recordings and annotations. we have focused our study on methods based on consistency regularization; specifically: Mean Teacher and Interpolation Consistency Training. Our experimental results reveal that; with the proper parameterization, incorporating weakly-labeled and unlabeled data is beneficial for detecting polyphonic sound events.

查看原文本刊更多论文

基于一致性正则化的最小监督复调音频事件检测

音频事件检测是指指定音频流中发生的事件的性质，并及时定位这些事件的任务。由于其在众多领域的广泛适用性，随着时间的推移，这项任务逐渐引起了人们的兴趣。音频事件检测任务的发展很大程度上是由现代深度学习技术主导的。深度网络架构需要大量的标记音频片段，其中包含每个事件的开始和结束时间。然而，收集和注释具有必要信息的音频记录的详尽数据集既昂贵又费力。为了解决这个问题，采用了弱标记半监督学习方法来缓解标记问题。在这项工作中，我们研究了将弱标签和未标记片段纳入音频事件检测器训练链的影响。我们在国内环境声事件检测语料库(DESED)上进行了实验;由多种类型的记录和注释组成的大规模异构数据集。我们的研究重点是基于一致性正则化的方法;具体来说:均值教师和插值一致性训练。实验结果表明:通过适当的参数化，将弱标记和未标记的数据结合起来，有利于复音事件的检测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 21st international Ccnference on Sciences and Techniques of Automatic Control and Computer Engineering (STA)

自引率

0.00%

发文量