文本独立说话人验证的外部关注统计池

Lidong Pan, Chunhao He, Tieyuan Chang
{"title":"文本独立说话人验证的外部关注统计池","authors":"Lidong Pan, Chunhao He, Tieyuan Chang","doi":"10.1109/CCAI57533.2023.10201326","DOIUrl":null,"url":null,"abstract":"Speaker verification is an important biometric identification technique. In the neural network-based speaker feature extraction model, the pooling layer plays an important role. This layer aggregates frame-level features to obtain utterance-level features, and different pooling methods have different effects on the aggregation of frame-level features, which in turn affects the characterization ability of the final speaker features. In the existing work, some pooling methods with attention mechanisms have shown stronger feature aggregation capability than traditional pooling methods. In this paper, we combine a low-complexity External Attention with statistics pooling to design External-Attentive Statistics Pooling and propose Multi-Group External-Attentive Statistics Pooling considering the biological properties of human hearing. The two methods are used in text-independent speaker verification and tested on the VoxCeleb1 test set, VoxCeleb1-H, and VoxCeleb1-E. The test results show that the proposed method achieves more effective feature aggregation without significantly increasing the number of model parameters.","PeriodicalId":285760,"journal":{"name":"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"External-Attentive Statistics Pooling for Text-Independent Speaker Verification\",\"authors\":\"Lidong Pan, Chunhao He, Tieyuan Chang\",\"doi\":\"10.1109/CCAI57533.2023.10201326\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speaker verification is an important biometric identification technique. In the neural network-based speaker feature extraction model, the pooling layer plays an important role. This layer aggregates frame-level features to obtain utterance-level features, and different pooling methods have different effects on the aggregation of frame-level features, which in turn affects the characterization ability of the final speaker features. In the existing work, some pooling methods with attention mechanisms have shown stronger feature aggregation capability than traditional pooling methods. In this paper, we combine a low-complexity External Attention with statistics pooling to design External-Attentive Statistics Pooling and propose Multi-Group External-Attentive Statistics Pooling considering the biological properties of human hearing. The two methods are used in text-independent speaker verification and tested on the VoxCeleb1 test set, VoxCeleb1-H, and VoxCeleb1-E. The test results show that the proposed method achieves more effective feature aggregation without significantly increasing the number of model parameters.\",\"PeriodicalId\":285760,\"journal\":{\"name\":\"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCAI57533.2023.10201326\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCAI57533.2023.10201326","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

说话人验证是一种重要的生物特征识别技术。在基于神经网络的说话人特征提取模型中,池化层起着重要的作用。该层对帧级特征进行聚合得到话语级特征,不同的池化方法对帧级特征的聚合有不同的影响,进而影响最终说话人特征的表征能力。在现有的工作中,一些带有注意机制的池化方法已经显示出比传统池化方法更强的特征聚合能力。本文将低复杂度的外部注意与统计池相结合,设计了外部注意统计池,并考虑了人类听觉的生物学特性,提出了多组外部注意统计池。将这两种方法用于文本无关的说话人验证,并在VoxCeleb1测试集、VoxCeleb1- h和VoxCeleb1- e上进行了测试。实验结果表明,该方法在不显著增加模型参数数量的情况下,实现了更有效的特征聚合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
External-Attentive Statistics Pooling for Text-Independent Speaker Verification
Speaker verification is an important biometric identification technique. In the neural network-based speaker feature extraction model, the pooling layer plays an important role. This layer aggregates frame-level features to obtain utterance-level features, and different pooling methods have different effects on the aggregation of frame-level features, which in turn affects the characterization ability of the final speaker features. In the existing work, some pooling methods with attention mechanisms have shown stronger feature aggregation capability than traditional pooling methods. In this paper, we combine a low-complexity External Attention with statistics pooling to design External-Attentive Statistics Pooling and propose Multi-Group External-Attentive Statistics Pooling considering the biological properties of human hearing. The two methods are used in text-independent speaker verification and tested on the VoxCeleb1 test set, VoxCeleb1-H, and VoxCeleb1-E. The test results show that the proposed method achieves more effective feature aggregation without significantly increasing the number of model parameters.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信