{"title":"ASC Model Based on Feature Stratification and Multichannel ECAP A- TDNN","authors":"Ai Xin, Zhang Haitao, Zhao Shuai","doi":"10.1109/ISAIEE57420.2022.00118","DOIUrl":null,"url":null,"abstract":"The input audio signal in the acoustic scene classification(ASC) task is composed of multiple acoustic events superimposed on each other, leading to problems such as low recognition rate of complex environments and easy overfitting of the model easily. An ASC model based on feature stratification and multichannel ECAPA- TDNN is proposed to address the above problems. Firstly, the extended harmonic-percussive source separation(HPSS) technique is used to divide the log-Mel spectrogram into three components of harmonics, percussive sources and residuals, each of which contains specific types of feature data, to strip the audio signals in the superposition state. On the other hand, the ECAP A-TDNN network structure, which has performed well in the field of acoustic recognition, is applied, and a multichannel ECAP A-TDNN is proposed in combination with the group convolution technique, into which the feature components are input for the ASC task. The results show that the ASC model based on feature stratification can not only reduce the overfitting problem generated by audio overlap, but also enhance the recognition ability of the model for complex environments; moreover, ECAPA-TDNN can achieve a more continuous focus on acoustic features and improve the classification performance while maintaining the original parameter magnitude.","PeriodicalId":345703,"journal":{"name":"2022 International Symposium on Advances in Informatics, Electronics and Education (ISAIEE)","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Symposium on Advances in Informatics, Electronics and Education (ISAIEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISAIEE57420.2022.00118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The input audio signal in the acoustic scene classification(ASC) task is composed of multiple acoustic events superimposed on each other, leading to problems such as low recognition rate of complex environments and easy overfitting of the model easily. An ASC model based on feature stratification and multichannel ECAPA- TDNN is proposed to address the above problems. Firstly, the extended harmonic-percussive source separation(HPSS) technique is used to divide the log-Mel spectrogram into three components of harmonics, percussive sources and residuals, each of which contains specific types of feature data, to strip the audio signals in the superposition state. On the other hand, the ECAP A-TDNN network structure, which has performed well in the field of acoustic recognition, is applied, and a multichannel ECAP A-TDNN is proposed in combination with the group convolution technique, into which the feature components are input for the ASC task. The results show that the ASC model based on feature stratification can not only reduce the overfitting problem generated by audio overlap, but also enhance the recognition ability of the model for complex environments; moreover, ECAPA-TDNN can achieve a more continuous focus on acoustic features and improve the classification performance while maintaining the original parameter magnitude.
在声学场景分类(ASC)任务中,输入的音频信号是由多个声学事件相互叠加而成,存在复杂环境识别率低、模型容易过拟合等问题。针对上述问题,提出了一种基于特征分层和多通道ECAPA- TDNN的ASC模型。首先,采用扩展谐波-冲击源分离(HPSS)技术,将对数mel谱图划分为谐波、冲击源和残差三个分量,每个分量包含特定类型的特征数据,以剥离处于叠加状态的音频信号;另一方面,应用在声学识别领域表现良好的ECAP a - tdnn网络结构,并结合群卷积技术提出了一种多通道ECAP a - tdnn网络,将特征分量输入到该网络中用于ASC任务。结果表明,基于特征分层的ASC模型不仅可以减少音频重叠产生的过拟合问题,而且可以增强模型对复杂环境的识别能力;此外,ECAPA-TDNN可以在保持原始参数量级的情况下实现对声学特征的更连续关注,提高分类性能。