Sound Event Detection in Domestic Environment Using Frequency-Dynamic Convolution and Local Attention

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Grigorios-Aris Cheimariotis, Nikolaos Mitianoudis
{"title":"Sound Event Detection in Domestic Environment Using Frequency-Dynamic Convolution and Local Attention","authors":"Grigorios-Aris Cheimariotis, Nikolaos Mitianoudis","doi":"10.3390/info14100534","DOIUrl":null,"url":null,"abstract":"This work describes a methodology for sound event detection in domestic environments. Efficient solutions in this task can support the autonomous living of the elderly. The methodology deals with the “Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE)” 2023, and more specifically with Task 4a “Sound event detection of domestic activities”. This task involves the detection of 10 common events in domestic environments in 10 s sound clips. The events may have arbitrary duration in the 10 s clip. The main components of the methodology are data augmentation on mel-spectrograms that represent the sound clips, feature extraction by passing spectrograms through a frequency-dynamic convolution network with an extra attention module in sequence with each convolution, concatenation of these features with BEATs embeddings, and use of BiGRU for sequence modeling. Also, a mean teacher model is employed for leveraging unlabeled data. This research focuses on the effect of data augmentation techniques, of the feature extraction models, and on self-supervised learning. The main contribution is the proposed feature extraction model, which uses weighted attention on frequency in each convolution, combined in sequence with a local attention module adopted by computer vision. The proposed system features promising and robust performance.","PeriodicalId":38479,"journal":{"name":"Information (Switzerland)","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2023-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information (Switzerland)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/info14100534","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

This work describes a methodology for sound event detection in domestic environments. Efficient solutions in this task can support the autonomous living of the elderly. The methodology deals with the “Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE)” 2023, and more specifically with Task 4a “Sound event detection of domestic activities”. This task involves the detection of 10 common events in domestic environments in 10 s sound clips. The events may have arbitrary duration in the 10 s clip. The main components of the methodology are data augmentation on mel-spectrograms that represent the sound clips, feature extraction by passing spectrograms through a frequency-dynamic convolution network with an extra attention module in sequence with each convolution, concatenation of these features with BEATs embeddings, and use of BiGRU for sequence modeling. Also, a mean teacher model is employed for leveraging unlabeled data. This research focuses on the effect of data augmentation techniques, of the feature extraction models, and on self-supervised learning. The main contribution is the proposed feature extraction model, which uses weighted attention on frequency in each convolution, combined in sequence with a local attention module adopted by computer vision. The proposed system features promising and robust performance.
基于频率动态卷积和局部注意的环境声事件检测
这项工作描述了一种在家庭环境中检测声音事件的方法。这项任务的有效解决方案可以支持老年人的自主生活。该方法处理2023年的“声学场景和事件的检测和分类挑战(DCASE)”,更具体地说,是任务4a“家庭活动的声音事件检测”。这项任务包括在10秒的声音片段中检测10个家庭环境中常见的事件。事件可以在10秒剪辑中具有任意的持续时间。该方法的主要组成部分是对代表声音片段的mel-谱图进行数据增强,通过频率动态卷积网络传递谱图(每个卷积都有一个额外的注意模块)来提取特征,将这些特征与BEATs嵌入连接起来,并使用BiGRU进行序列建模。此外,平均教师模型被用于利用未标记的数据。本研究的重点是数据增强技术、特征提取模型和自监督学习的效果。本文的主要贡献是提出的特征提取模型,该模型在每个卷积中对频率进行加权关注,并依次与计算机视觉采用的局部关注模块相结合。该系统具有良好的鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Information (Switzerland)
Information (Switzerland) Computer Science-Information Systems
CiteScore
6.90
自引率
0.00%
发文量
515
审稿时长
11 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信