CNN with Phonetic Attention for Text-Independent Speaker Verification

Tianyan Zhou, Yong Zhao, Jinyu Li, Y. Gong, Jian Wu
{"title":"CNN with Phonetic Attention for Text-Independent Speaker Verification","authors":"Tianyan Zhou, Yong Zhao, Jinyu Li, Y. Gong, Jian Wu","doi":"10.1109/ASRU46091.2019.9003826","DOIUrl":null,"url":null,"abstract":"Text-independent speaker verification imposes no constraints on the spoken content and usually needs long observations to make reliable prediction. In this paper, we propose two speaker embedding approaches by integrating the phonetic information into the attention-based residual convolutional neural network (CNN). Phonetic features are extracted from the bottleneck layer of a pretrained acoustic model. In implicit phonetic attention (IPA), the phonetic features are projected by a transformation network into multi-channel feature maps, and then combined with the raw acoustic features as the input of the CNN network. In explicit phonetic attention (EPA), the phonetic features are directly connected to the attentive pooling layer through a separate 1-dim CNN to generate the attention weights. With the incorporation of spoken content and attention mechanism, the system can not only distill the speaker-discriminant frames but also actively normalize the phonetic variations. Multi-head attention and discriminative objectives are further studied to improve the system. Experiments on the VoxCeleb corpus show our proposed system could outperform the state-of-the-art by around 43% relative.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"182 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"49","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9003826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 49

Abstract

Text-independent speaker verification imposes no constraints on the spoken content and usually needs long observations to make reliable prediction. In this paper, we propose two speaker embedding approaches by integrating the phonetic information into the attention-based residual convolutional neural network (CNN). Phonetic features are extracted from the bottleneck layer of a pretrained acoustic model. In implicit phonetic attention (IPA), the phonetic features are projected by a transformation network into multi-channel feature maps, and then combined with the raw acoustic features as the input of the CNN network. In explicit phonetic attention (EPA), the phonetic features are directly connected to the attentive pooling layer through a separate 1-dim CNN to generate the attention weights. With the incorporation of spoken content and attention mechanism, the system can not only distill the speaker-discriminant frames but also actively normalize the phonetic variations. Multi-head attention and discriminative objectives are further studied to improve the system. Experiments on the VoxCeleb corpus show our proposed system could outperform the state-of-the-art by around 43% relative.
CNN与语音注意的文本独立说话者验证
与文本无关的说话人验证对说话内容没有限制,通常需要长时间的观察才能做出可靠的预测。本文通过将语音信息整合到基于注意的残差卷积神经网络(CNN)中,提出了两种说话人嵌入方法。从预训练声学模型的瓶颈层提取语音特征。在隐式语音注意(IPA)中,通过变换网络将语音特征映射成多通道特征映射,然后与原始声学特征结合作为CNN网络的输入。在显式语音注意(explicit phonetic attention, EPA)中,语音特征通过单独的1-dim CNN直接连接到注意池化层,生成注意权值。该系统将语音内容和注意机制结合起来,不仅可以提取说话人识别框架,还可以主动规范语音变化。进一步研究了多头关注和判别目标,对系统进行了改进。在VoxCeleb语料库上的实验表明,我们提出的系统可以比最先进的系统相对高出43%左右。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信