文本独立说话人验证的外部关注统计池

2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI) Pub Date : 2023-05-26 DOI:10.1109/CCAI57533.2023.10201326

Lidong Pan, Chunhao He, Tieyuan Chang

{"title":"文本独立说话人验证的外部关注统计池","authors":"Lidong Pan, Chunhao He, Tieyuan Chang","doi":"10.1109/CCAI57533.2023.10201326","DOIUrl":null,"url":null,"abstract":"Speaker verification is an important biometric identification technique. In the neural network-based speaker feature extraction model, the pooling layer plays an important role. This layer aggregates frame-level features to obtain utterance-level features, and different pooling methods have different effects on the aggregation of frame-level features, which in turn affects the characterization ability of the final speaker features. In the existing work, some pooling methods with attention mechanisms have shown stronger feature aggregation capability than traditional pooling methods. In this paper, we combine a low-complexity External Attention with statistics pooling to design External-Attentive Statistics Pooling and propose Multi-Group External-Attentive Statistics Pooling considering the biological properties of human hearing. The two methods are used in text-independent speaker verification and tested on the VoxCeleb1 test set, VoxCeleb1-H, and VoxCeleb1-E. The test results show that the proposed method achieves more effective feature aggregation without significantly increasing the number of model parameters.","PeriodicalId":285760,"journal":{"name":"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"External-Attentive Statistics Pooling for Text-Independent Speaker Verification\",\"authors\":\"Lidong Pan, Chunhao He, Tieyuan Chang\",\"doi\":\"10.1109/CCAI57533.2023.10201326\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speaker verification is an important biometric identification technique. In the neural network-based speaker feature extraction model, the pooling layer plays an important role. This layer aggregates frame-level features to obtain utterance-level features, and different pooling methods have different effects on the aggregation of frame-level features, which in turn affects the characterization ability of the final speaker features. In the existing work, some pooling methods with attention mechanisms have shown stronger feature aggregation capability than traditional pooling methods. In this paper, we combine a low-complexity External Attention with statistics pooling to design External-Attentive Statistics Pooling and propose Multi-Group External-Attentive Statistics Pooling considering the biological properties of human hearing. The two methods are used in text-independent speaker verification and tested on the VoxCeleb1 test set, VoxCeleb1-H, and VoxCeleb1-E. The test results show that the proposed method achieves more effective feature aggregation without significantly increasing the number of model parameters.\",\"PeriodicalId\":285760,\"journal\":{\"name\":\"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCAI57533.2023.10201326\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCAI57533.2023.10201326","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

说话人验证是一种重要的生物特征识别技术。在基于神经网络的说话人特征提取模型中，池化层起着重要的作用。该层对帧级特征进行聚合得到话语级特征，不同的池化方法对帧级特征的聚合有不同的影响，进而影响最终说话人特征的表征能力。在现有的工作中，一些带有注意机制的池化方法已经显示出比传统池化方法更强的特征聚合能力。本文将低复杂度的外部注意与统计池相结合，设计了外部注意统计池，并考虑了人类听觉的生物学特性，提出了多组外部注意统计池。将这两种方法用于文本无关的说话人验证，并在VoxCeleb1测试集、VoxCeleb1- h和VoxCeleb1- e上进行了测试。实验结果表明，该方法在不显著增加模型参数数量的情况下，实现了更有效的特征聚合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

External-Attentive Statistics Pooling for Text-Independent Speaker Verification

Speaker verification is an important biometric identification technique. In the neural network-based speaker feature extraction model, the pooling layer plays an important role. This layer aggregates frame-level features to obtain utterance-level features, and different pooling methods have different effects on the aggregation of frame-level features, which in turn affects the characterization ability of the final speaker features. In the existing work, some pooling methods with attention mechanisms have shown stronger feature aggregation capability than traditional pooling methods. In this paper, we combine a low-complexity External Attention with statistics pooling to design External-Attentive Statistics Pooling and propose Multi-Group External-Attentive Statistics Pooling considering the biological properties of human hearing. The two methods are used in text-independent speaker verification and tested on the VoxCeleb1 test set, VoxCeleb1-H, and VoxCeleb1-E. The test results show that the proposed method achieves more effective feature aggregation without significantly increasing the number of model parameters.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)

自引率

0.00%

发文量