基于层次时间聚合的自一致性训练用于声音事件检测

Yunlong Li, Xiujuan Zhu, Mingyu Wang, Ying Hu
{"title":"基于层次时间聚合的自一致性训练用于声音事件检测","authors":"Yunlong Li, Xiujuan Zhu, Mingyu Wang, Ying Hu","doi":"10.23919/APSIPAASC55919.2022.9980285","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a sound event detection (SED) method based on the self-consistency training (SCT) strategy and a hierarchical temporal aggregation (HTA) module, named SCT-HTA. This method adopts Mean Teacher (MT) semi-supervised learning method, exploiting a dual-branch convolutional recurrent neural network (CRNN) structure including the main branch and auxiliary branch. We adopt an SCT strategy to apply the self-consistency regularization in addition to the MT loss to maintain the consistency between the outputs of the auxiliary and main branches. Furthermore, an HTA module is designed to aggregate the information at different temporal resolutions. We also explored three aggregators to be applied in the HTA module and four kinds of combinations of pooling methods in the localization modules of two branches. Experimental results demonstrate that our proposed SCT-HTA method outperforms the four compared methods. The results show that the max pooling aggregator has a better ability to highlight the location of sound events. And the “linear softmax + attention” combination of the pooling method achieves the best performance.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Self-Consistency Training with Hierarchical Temporal Aggregation for Sound Event Detection\",\"authors\":\"Yunlong Li, Xiujuan Zhu, Mingyu Wang, Ying Hu\",\"doi\":\"10.23919/APSIPAASC55919.2022.9980285\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a sound event detection (SED) method based on the self-consistency training (SCT) strategy and a hierarchical temporal aggregation (HTA) module, named SCT-HTA. This method adopts Mean Teacher (MT) semi-supervised learning method, exploiting a dual-branch convolutional recurrent neural network (CRNN) structure including the main branch and auxiliary branch. We adopt an SCT strategy to apply the self-consistency regularization in addition to the MT loss to maintain the consistency between the outputs of the auxiliary and main branches. Furthermore, an HTA module is designed to aggregate the information at different temporal resolutions. We also explored three aggregators to be applied in the HTA module and four kinds of combinations of pooling methods in the localization modules of two branches. Experimental results demonstrate that our proposed SCT-HTA method outperforms the four compared methods. The results show that the max pooling aggregator has a better ability to highlight the location of sound events. And the “linear softmax + attention” combination of the pooling method achieves the best performance.\",\"PeriodicalId\":382967,\"journal\":{\"name\":\"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/APSIPAASC55919.2022.9980285\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPAASC55919.2022.9980285","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文提出了一种基于自一致性训练(SCT)策略和分层时间聚合(HTA)模块的声音事件检测(SED)方法,命名为SCT-HTA。该方法采用均值教师(Mean Teacher, MT)半监督学习方法,利用包含主分支和辅助分支的双分支卷积递归神经网络(CRNN)结构。我们采用SCT策略,除了MT损失外,还应用自一致性正则化来保持辅助分支和主分支输出之间的一致性。此外,还设计了一个HTA模块来聚合不同时间分辨率的信息。我们还探索了三种聚合器用于HTA模块,四种池化方法组合用于两个分支的定位模块。实验结果表明,我们提出的SCT-HTA方法优于四种比较方法。结果表明,最大池聚合器具有较好的突出声音事件位置的能力。而“线性softmax +注意力”组合的池化方法达到了最佳的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Self-Consistency Training with Hierarchical Temporal Aggregation for Sound Event Detection
In this paper, we propose a sound event detection (SED) method based on the self-consistency training (SCT) strategy and a hierarchical temporal aggregation (HTA) module, named SCT-HTA. This method adopts Mean Teacher (MT) semi-supervised learning method, exploiting a dual-branch convolutional recurrent neural network (CRNN) structure including the main branch and auxiliary branch. We adopt an SCT strategy to apply the self-consistency regularization in addition to the MT loss to maintain the consistency between the outputs of the auxiliary and main branches. Furthermore, an HTA module is designed to aggregate the information at different temporal resolutions. We also explored three aggregators to be applied in the HTA module and four kinds of combinations of pooling methods in the localization modules of two branches. Experimental results demonstrate that our proposed SCT-HTA method outperforms the four compared methods. The results show that the max pooling aggregator has a better ability to highlight the location of sound events. And the “linear softmax + attention” combination of the pooling method achieves the best performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信