Improves Neural Acoustic Word Embeddings Query by Example Spoken Term Detection with Wav2vec Pretraining and Circle Loss

Zhaoqi Li, Long Wu, Ta Li, Yonghong Yan
{"title":"Improves Neural Acoustic Word Embeddings Query by Example Spoken Term Detection with Wav2vec Pretraining and Circle Loss","authors":"Zhaoqi Li, Long Wu, Ta Li, Yonghong Yan","doi":"10.1109/ISCSLP49672.2021.9362065","DOIUrl":null,"url":null,"abstract":"Query by example spoken term detection (QbE-STD) is a popular keyword detection method in the absence of speech resources. It can build a keyword query system with decent performance when there are few labeled speeches and a lack of pronunciation dictionaries. In recent years, neural acoustic word embeddings (NAWEs) has become a commonly used QbE-STD method. To make the embedded features extracted by the neural network contain more accurate context information, we use wav2vec pre-training to improve the performance of the network. Compared with the Mel-frequency cepstral coefficients(MFCC) system, the average precision (AP) is relatively improved by 11.1%. We also find that the AP of the wav2vec and MFCC splicing system is better, demonstrating that wav2vec cannot contain all spectrum information. To accelerate the convergence speed of the splicing system, we use circle loss to replace the triplet loss, making the convergence about 40% epochs earlier on average. The circle loss also relatively increases AP by more than 4.9%. The AP of our best-performing system is 7.7% better than the wav2vec baseline system and 19.7% better than the MFCC baseline system.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP49672.2021.9362065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Query by example spoken term detection (QbE-STD) is a popular keyword detection method in the absence of speech resources. It can build a keyword query system with decent performance when there are few labeled speeches and a lack of pronunciation dictionaries. In recent years, neural acoustic word embeddings (NAWEs) has become a commonly used QbE-STD method. To make the embedded features extracted by the neural network contain more accurate context information, we use wav2vec pre-training to improve the performance of the network. Compared with the Mel-frequency cepstral coefficients(MFCC) system, the average precision (AP) is relatively improved by 11.1%. We also find that the AP of the wav2vec and MFCC splicing system is better, demonstrating that wav2vec cannot contain all spectrum information. To accelerate the convergence speed of the splicing system, we use circle loss to replace the triplet loss, making the convergence about 40% epochs earlier on average. The circle loss also relatively increases AP by more than 4.9%. The AP of our best-performing system is 7.7% better than the wav2vec baseline system and 19.7% better than the MFCC baseline system.
基于Wav2vec预训练和圆损失的语音词检测改进神经声学词嵌入查询
实例查询语音词检测(Query by example spoken term detection, QbE-STD)是在缺乏语音资源的情况下流行的关键字检测方法。在语音标注少、语音词典缺乏的情况下,可以构建一个性能良好的关键词查询系统。近年来,神经声学词嵌入(NAWEs)已成为一种常用的QbE-STD方法。为了使神经网络提取的嵌入特征包含更准确的上下文信息,我们使用了wav2vec预训练来提高网络的性能。与mel -倒谱系数(MFCC)系统相比,平均精度(AP)相对提高了11.1%。我们还发现,wav2vec和MFCC拼接系统的AP更好,说明wav2vec不能包含所有的频谱信息。为了加快拼接系统的收敛速度,我们使用了圆损耗来代替三重态损耗,使拼接系统的收敛速度平均提前了40%左右。圈损也使AP相对增加4.9%以上。我们的最佳性能系统的AP比wav2vec基准系统高7.7%,比MFCC基准系统高19.7%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信