Sound event localization and detection using element-wise attention gate and asymmetric convolutional recurrent neural networks

IF 1.4 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

AI Communications Pub Date : 2023-04-20 DOI:10.3233/aic-220125

Lean Yan, Min Guo, Zhiqiang Li

{"title":"Sound event localization and detection using element-wise attention gate and asymmetric convolutional recurrent neural networks","authors":"Lean Yan, Min Guo, Zhiqiang Li","doi":"10.3233/aic-220125","DOIUrl":null,"url":null,"abstract":"There are problems that standard square convolution kernel has insufficient representation ability and recurrent neural network usually ignores the importance of different elements within an input vector in sound event localization and detection. This paper proposes an element-wise attention gate-asymmetric convolutional recurrent neural network (EleAttG-ACRNN), to improve the performance of sound event localization and detection. First, a convolutional neural network with context gating and asymmetric squeeze excitation residual is constructed, where asymmetric convolution enhances the capability of the square convolution kernel; squeeze excitation can improve the interdependence between channels; context gating can weight the important features and suppress the irrelevant features. Next, in order to improve the expressiveness of the model, we integrate the element-wise attention gate into the bidirectional gated recurrent network, which is to highlight the importance of different elements within an input vector, and further learn the temporal context information. Evaluation results using the TAU Spatial Sound Events 2019-Ambisonic dataset show the effectiveness of the proposed method, and it improves SELD performance up to 0.05 in error rate, 1.7% in F-score, 0.7° in DOA error, and 4.5% in Frame recall compared to a CRNN method.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":"27 1","pages":"147-157"},"PeriodicalIF":1.4000,"publicationDate":"2023-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI Communications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3233/aic-220125","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

There are problems that standard square convolution kernel has insufficient representation ability and recurrent neural network usually ignores the importance of different elements within an input vector in sound event localization and detection. This paper proposes an element-wise attention gate-asymmetric convolutional recurrent neural network (EleAttG-ACRNN), to improve the performance of sound event localization and detection. First, a convolutional neural network with context gating and asymmetric squeeze excitation residual is constructed, where asymmetric convolution enhances the capability of the square convolution kernel; squeeze excitation can improve the interdependence between channels; context gating can weight the important features and suppress the irrelevant features. Next, in order to improve the expressiveness of the model, we integrate the element-wise attention gate into the bidirectional gated recurrent network, which is to highlight the importance of different elements within an input vector, and further learn the temporal context information. Evaluation results using the TAU Spatial Sound Events 2019-Ambisonic dataset show the effectiveness of the proposed method, and it improves SELD performance up to 0.05 in error rate, 1.7% in F-score, 0.7° in DOA error, and 4.5% in Frame recall compared to a CRNN method.

查看原文本刊更多论文

基于元素注意门和非对称卷积递归神经网络的声音事件定位与检测

存在标准平方卷积核表示能力不足、递归神经网络在声音事件定位和检测中往往忽略输入向量内不同元素的重要性等问题。为了提高声音事件定位和检测的性能，提出了一种基于元素的注意门-非对称卷积递归神经网络(EleAttG-ACRNN)。首先，构造了具有上下文门控和非对称挤压激励残差的卷积神经网络，其中非对称卷积增强了卷积核的能力;挤压激励可以改善通道间的相互依赖性;上下文门控可以对重要特征进行加权，对无关特征进行抑制。接下来，为了提高模型的表达能力，我们将基于元素的注意门集成到双向门控循环网络中，以突出输入向量中不同元素的重要性，并进一步学习时间上下文信息。使用TAU空间声音事件2019-Ambisonic数据集的评估结果表明了该方法的有效性，与CRNN方法相比，该方法将SELD性能的错误率提高了0.05,f分数提高了1.7%，DOA误差降低了0.7°，帧召回率提高了4.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

AI Communications 工程技术-计算机：人工智能

CiteScore

2.30

自引率

12.50%

发文量

审稿时长

4.5 months

期刊介绍： AI Communications is a journal on artificial intelligence (AI) which has a close relationship to EurAI (European Association for Artificial Intelligence, formerly ECCAI). It covers the whole AI community: Scientific institutions as well as commercial and industrial companies. AI Communications aims to enhance contacts and information exchange between AI researchers and developers, and to provide supranational information to those concerned with AI and advanced information processing. AI Communications publishes refereed articles concerning scientific and technical AI procedures, provided they are of sufficient interest to a large readership of both scientific and practical background. In addition it contains high-level background material, both at the technical level as well as the level of opinions, policies and news.