Sound event localization and detection using element-wise attention gate and asymmetric convolutional recurrent neural networks

IF 1.4 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Lean Yan, Min Guo, Zhiqiang Li
{"title":"Sound event localization and detection using element-wise attention gate and asymmetric convolutional recurrent neural networks","authors":"Lean Yan, Min Guo, Zhiqiang Li","doi":"10.3233/aic-220125","DOIUrl":null,"url":null,"abstract":"There are problems that standard square convolution kernel has insufficient representation ability and recurrent neural network usually ignores the importance of different elements within an input vector in sound event localization and detection. This paper proposes an element-wise attention gate-asymmetric convolutional recurrent neural network (EleAttG-ACRNN), to improve the performance of sound event localization and detection. First, a convolutional neural network with context gating and asymmetric squeeze excitation residual is constructed, where asymmetric convolution enhances the capability of the square convolution kernel; squeeze excitation can improve the interdependence between channels; context gating can weight the important features and suppress the irrelevant features. Next, in order to improve the expressiveness of the model, we integrate the element-wise attention gate into the bidirectional gated recurrent network, which is to highlight the importance of different elements within an input vector, and further learn the temporal context information. Evaluation results using the TAU Spatial Sound Events 2019-Ambisonic dataset show the effectiveness of the proposed method, and it improves SELD performance up to 0.05 in error rate, 1.7% in F-score, 0.7° in DOA error, and 4.5% in Frame recall compared to a CRNN method.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":"27 1","pages":"147-157"},"PeriodicalIF":1.4000,"publicationDate":"2023-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI Communications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3233/aic-220125","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

There are problems that standard square convolution kernel has insufficient representation ability and recurrent neural network usually ignores the importance of different elements within an input vector in sound event localization and detection. This paper proposes an element-wise attention gate-asymmetric convolutional recurrent neural network (EleAttG-ACRNN), to improve the performance of sound event localization and detection. First, a convolutional neural network with context gating and asymmetric squeeze excitation residual is constructed, where asymmetric convolution enhances the capability of the square convolution kernel; squeeze excitation can improve the interdependence between channels; context gating can weight the important features and suppress the irrelevant features. Next, in order to improve the expressiveness of the model, we integrate the element-wise attention gate into the bidirectional gated recurrent network, which is to highlight the importance of different elements within an input vector, and further learn the temporal context information. Evaluation results using the TAU Spatial Sound Events 2019-Ambisonic dataset show the effectiveness of the proposed method, and it improves SELD performance up to 0.05 in error rate, 1.7% in F-score, 0.7° in DOA error, and 4.5% in Frame recall compared to a CRNN method.
基于元素注意门和非对称卷积递归神经网络的声音事件定位与检测
存在标准平方卷积核表示能力不足、递归神经网络在声音事件定位和检测中往往忽略输入向量内不同元素的重要性等问题。为了提高声音事件定位和检测的性能,提出了一种基于元素的注意门-非对称卷积递归神经网络(EleAttG-ACRNN)。首先,构造了具有上下文门控和非对称挤压激励残差的卷积神经网络,其中非对称卷积增强了卷积核的能力;挤压激励可以改善通道间的相互依赖性;上下文门控可以对重要特征进行加权,对无关特征进行抑制。接下来,为了提高模型的表达能力,我们将基于元素的注意门集成到双向门控循环网络中,以突出输入向量中不同元素的重要性,并进一步学习时间上下文信息。使用TAU空间声音事件2019-Ambisonic数据集的评估结果表明了该方法的有效性,与CRNN方法相比,该方法将SELD性能的错误率提高了0.05,f分数提高了1.7%,DOA误差降低了0.7°,帧召回率提高了4.5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
AI Communications
AI Communications 工程技术-计算机:人工智能
CiteScore
2.30
自引率
12.50%
发文量
34
审稿时长
4.5 months
期刊介绍: AI Communications is a journal on artificial intelligence (AI) which has a close relationship to EurAI (European Association for Artificial Intelligence, formerly ECCAI). It covers the whole AI community: Scientific institutions as well as commercial and industrial companies. AI Communications aims to enhance contacts and information exchange between AI researchers and developers, and to provide supranational information to those concerned with AI and advanced information processing. AI Communications publishes refereed articles concerning scientific and technical AI procedures, provided they are of sufficient interest to a large readership of both scientific and practical background. In addition it contains high-level background material, both at the technical level as well as the level of opinions, policies and news.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信