ASC Model Based on Feature Stratification and Multichannel ECAP A- TDNN

Ai Xin, Zhang Haitao, Zhao Shuai
{"title":"ASC Model Based on Feature Stratification and Multichannel ECAP A- TDNN","authors":"Ai Xin, Zhang Haitao, Zhao Shuai","doi":"10.1109/ISAIEE57420.2022.00118","DOIUrl":null,"url":null,"abstract":"The input audio signal in the acoustic scene classification(ASC) task is composed of multiple acoustic events superimposed on each other, leading to problems such as low recognition rate of complex environments and easy overfitting of the model easily. An ASC model based on feature stratification and multichannel ECAPA- TDNN is proposed to address the above problems. Firstly, the extended harmonic-percussive source separation(HPSS) technique is used to divide the log-Mel spectrogram into three components of harmonics, percussive sources and residuals, each of which contains specific types of feature data, to strip the audio signals in the superposition state. On the other hand, the ECAP A-TDNN network structure, which has performed well in the field of acoustic recognition, is applied, and a multichannel ECAP A-TDNN is proposed in combination with the group convolution technique, into which the feature components are input for the ASC task. The results show that the ASC model based on feature stratification can not only reduce the overfitting problem generated by audio overlap, but also enhance the recognition ability of the model for complex environments; moreover, ECAPA-TDNN can achieve a more continuous focus on acoustic features and improve the classification performance while maintaining the original parameter magnitude.","PeriodicalId":345703,"journal":{"name":"2022 International Symposium on Advances in Informatics, Electronics and Education (ISAIEE)","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Symposium on Advances in Informatics, Electronics and Education (ISAIEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISAIEE57420.2022.00118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The input audio signal in the acoustic scene classification(ASC) task is composed of multiple acoustic events superimposed on each other, leading to problems such as low recognition rate of complex environments and easy overfitting of the model easily. An ASC model based on feature stratification and multichannel ECAPA- TDNN is proposed to address the above problems. Firstly, the extended harmonic-percussive source separation(HPSS) technique is used to divide the log-Mel spectrogram into three components of harmonics, percussive sources and residuals, each of which contains specific types of feature data, to strip the audio signals in the superposition state. On the other hand, the ECAP A-TDNN network structure, which has performed well in the field of acoustic recognition, is applied, and a multichannel ECAP A-TDNN is proposed in combination with the group convolution technique, into which the feature components are input for the ASC task. The results show that the ASC model based on feature stratification can not only reduce the overfitting problem generated by audio overlap, but also enhance the recognition ability of the model for complex environments; moreover, ECAPA-TDNN can achieve a more continuous focus on acoustic features and improve the classification performance while maintaining the original parameter magnitude.
基于特征分层和多通道ECAP的ASC模型
在声学场景分类(ASC)任务中,输入的音频信号是由多个声学事件相互叠加而成,存在复杂环境识别率低、模型容易过拟合等问题。针对上述问题,提出了一种基于特征分层和多通道ECAPA- TDNN的ASC模型。首先,采用扩展谐波-冲击源分离(HPSS)技术,将对数mel谱图划分为谐波、冲击源和残差三个分量,每个分量包含特定类型的特征数据,以剥离处于叠加状态的音频信号;另一方面,应用在声学识别领域表现良好的ECAP a - tdnn网络结构,并结合群卷积技术提出了一种多通道ECAP a - tdnn网络,将特征分量输入到该网络中用于ASC任务。结果表明,基于特征分层的ASC模型不仅可以减少音频重叠产生的过拟合问题,而且可以增强模型对复杂环境的识别能力;此外,ECAPA-TDNN可以在保持原始参数量级的情况下实现对声学特征的更连续关注,提高分类性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信