基于改进ECAPA-TDNN的多场景鲁棒说话人验证系统

Xi Xuan, Rong Jin, Tingyu Xuan, Guolei Du, Kaisheng Xuan
{"title":"基于改进ECAPA-TDNN的多场景鲁棒说话人验证系统","authors":"Xi Xuan, Rong Jin, Tingyu Xuan, Guolei Du, Kaisheng Xuan","doi":"10.1109/IAEAC54830.2022.9929964","DOIUrl":null,"url":null,"abstract":"In order to solve the problems of cross-domain, short speech, and noise interference in industrial application scenarios of speaker recognition, this paper proposes an improved ECAPA-TDNN for a multi-scene robust speaker verification system architecture-improved DD-ECAP A-TDNN.The design of the DD-ECAPA-TDNN architecture is inspired by the model ECAPA-TDNN, which has recently become popular in ASV systems. Firstly, we use FBanks to extract acoustic features, followed by the DD-SE-Res2Net Block proposed in this paper to capture local features efficiently. Finally, the output feature mapping of all DD-SE-Res2Net Blocks aggregated at multiple scales, and finally the ASP pooling operation is performed. The experiments were based on the VoxCeleb1-dev dataset, and SC-AAMSoftmax was used to train a speaker identification model for 1211 speakers. This DD-ECAPA-TDNN model was used as speaker embedding extractor to construct an automatic speaker verification (ASV) system. We used VoxMovies and VoxCeleb1-O evaluation sets to simulate three scenarios of cross-domain, short speech and noise interference, respectively, to evaluate the performance of the DD-ECAPA-TDNN system under multiple scenarios. The system achieves an EER of 2.51% on VoxCeleb1-O. The DD-ECAPA-TDNN system significantly outperforms the ECAPA-TDNN system in terms of recognition performance in multiple scenarios. Finally, our ablation experiments show that the DD-SE-Res2N et Block has a positive impact on the performance of the ASV system, as well as that the DD-ECAPA-TDNN can extract robust and accurate speaker embedding with good scene generalization.","PeriodicalId":349113,"journal":{"name":"2022 IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC )","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Scene Robust Speaker Verification System Built on Improved ECAPA-TDNN\",\"authors\":\"Xi Xuan, Rong Jin, Tingyu Xuan, Guolei Du, Kaisheng Xuan\",\"doi\":\"10.1109/IAEAC54830.2022.9929964\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to solve the problems of cross-domain, short speech, and noise interference in industrial application scenarios of speaker recognition, this paper proposes an improved ECAPA-TDNN for a multi-scene robust speaker verification system architecture-improved DD-ECAP A-TDNN.The design of the DD-ECAPA-TDNN architecture is inspired by the model ECAPA-TDNN, which has recently become popular in ASV systems. Firstly, we use FBanks to extract acoustic features, followed by the DD-SE-Res2Net Block proposed in this paper to capture local features efficiently. Finally, the output feature mapping of all DD-SE-Res2Net Blocks aggregated at multiple scales, and finally the ASP pooling operation is performed. The experiments were based on the VoxCeleb1-dev dataset, and SC-AAMSoftmax was used to train a speaker identification model for 1211 speakers. This DD-ECAPA-TDNN model was used as speaker embedding extractor to construct an automatic speaker verification (ASV) system. We used VoxMovies and VoxCeleb1-O evaluation sets to simulate three scenarios of cross-domain, short speech and noise interference, respectively, to evaluate the performance of the DD-ECAPA-TDNN system under multiple scenarios. The system achieves an EER of 2.51% on VoxCeleb1-O. The DD-ECAPA-TDNN system significantly outperforms the ECAPA-TDNN system in terms of recognition performance in multiple scenarios. Finally, our ablation experiments show that the DD-SE-Res2N et Block has a positive impact on the performance of the ASV system, as well as that the DD-ECAPA-TDNN can extract robust and accurate speaker embedding with good scene generalization.\",\"PeriodicalId\":349113,\"journal\":{\"name\":\"2022 IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC )\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC )\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IAEAC54830.2022.9929964\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC )","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IAEAC54830.2022.9929964","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

为了解决工业应用场景中说话人识别存在的跨域、短语音、噪声干扰等问题,本文提出了一种多场景鲁棒说话人验证系统架构的改进ecap - tdnn——改进DD-ECAP - tdnn。DD-ECAPA-TDNN架构的设计灵感来自于最近在ASV系统中流行的ECAPA-TDNN模型。首先,我们使用FBanks提取声学特征,然后使用本文提出的DD-SE-Res2Net Block高效捕获局部特征。最后输出多个尺度聚合的所有DD-SE-Res2Net block的特征映射,最后进行ASP池化操作。实验基于VoxCeleb1-dev数据集,使用SC-AAMSoftmax对1211个说话人进行说话人识别模型训练。将DD-ECAPA-TDNN模型作为说话人嵌入提取器,构建了一个说话人自动验证系统。我们使用VoxMovies和VoxCeleb1-O评估集分别模拟了跨域、短语音和噪声干扰三种场景,评估了DD-ECAPA-TDNN系统在多种场景下的性能。该系统在VoxCeleb1-O上实现了2.51%的EER。DD-ECAPA-TDNN系统在多种场景下的识别性能显著优于ECAPA-TDNN系统。最后,我们的烧蚀实验表明,DD-SE-Res2N et Block对ASV系统的性能有积极的影响,并且DD-ECAPA-TDNN可以提取鲁棒准确的说话人嵌入,具有良好的场景泛化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-Scene Robust Speaker Verification System Built on Improved ECAPA-TDNN
In order to solve the problems of cross-domain, short speech, and noise interference in industrial application scenarios of speaker recognition, this paper proposes an improved ECAPA-TDNN for a multi-scene robust speaker verification system architecture-improved DD-ECAP A-TDNN.The design of the DD-ECAPA-TDNN architecture is inspired by the model ECAPA-TDNN, which has recently become popular in ASV systems. Firstly, we use FBanks to extract acoustic features, followed by the DD-SE-Res2Net Block proposed in this paper to capture local features efficiently. Finally, the output feature mapping of all DD-SE-Res2Net Blocks aggregated at multiple scales, and finally the ASP pooling operation is performed. The experiments were based on the VoxCeleb1-dev dataset, and SC-AAMSoftmax was used to train a speaker identification model for 1211 speakers. This DD-ECAPA-TDNN model was used as speaker embedding extractor to construct an automatic speaker verification (ASV) system. We used VoxMovies and VoxCeleb1-O evaluation sets to simulate three scenarios of cross-domain, short speech and noise interference, respectively, to evaluate the performance of the DD-ECAPA-TDNN system under multiple scenarios. The system achieves an EER of 2.51% on VoxCeleb1-O. The DD-ECAPA-TDNN system significantly outperforms the ECAPA-TDNN system in terms of recognition performance in multiple scenarios. Finally, our ablation experiments show that the DD-SE-Res2N et Block has a positive impact on the performance of the ASV system, as well as that the DD-ECAPA-TDNN can extract robust and accurate speaker embedding with good scene generalization.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信