ChildAugment: Data Augmentation Methods for Zero-Resource Children's Speaker Verification

Vishwanath Pratap Singh, Md. Sahidullah, T. Kinnunen
{"title":"ChildAugment: Data Augmentation Methods for Zero-Resource Children's Speaker Verification","authors":"Vishwanath Pratap Singh, Md. Sahidullah, T. Kinnunen","doi":"10.48550/arXiv.2402.15214","DOIUrl":null,"url":null,"abstract":"The accuracy of modern automatic speaker verification (ASV) systems, when trained exclusively on adult data, drops substantially when applied to children's speech. The scarcity of children's speech corpora hinders fine-tuning ASV systems for children's speech. Hence, there is a timely need to explore more effective ways of reusing adults' speech data. One promising approach is to align vocal-tract parameters between adults and children through children-specific data augmentation, referred here to as ChildAugment. Specifically, we modify the formant frequencies and formant bandwidths of adult speech to emulate children's speech. The modified spectra are used to train emphasized channel attention, propagation, and aggregation in time-delay neural network recognizer for children. We compare ChildAugment against various state-of-the-art data augmentation techniques for children's ASV. We also extensively compare different scoring methods, including cosine scoring, probabilistic linear discriminant analysis (PLDA), and neural PLDA. We also propose a low-complexity weighted cosine score for extremely low-resource children ASV. Our findings on the CSLU kids corpus indicate that ChildAugment holds promise as a simple, acoustics-motivated approach, for improving state-of-the-art deep learning based ASV for children. We achieve up to 12.45% (boys) and 11.96% (girls) relative improvement over the baseline. For reproducibility, we provide the evaluation protocols and codes here.","PeriodicalId":256727,"journal":{"name":"The Journal of the Acoustical Society of America","volume":"10 4","pages":"2221-2232"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of the Acoustical Society of America","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2402.15214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The accuracy of modern automatic speaker verification (ASV) systems, when trained exclusively on adult data, drops substantially when applied to children's speech. The scarcity of children's speech corpora hinders fine-tuning ASV systems for children's speech. Hence, there is a timely need to explore more effective ways of reusing adults' speech data. One promising approach is to align vocal-tract parameters between adults and children through children-specific data augmentation, referred here to as ChildAugment. Specifically, we modify the formant frequencies and formant bandwidths of adult speech to emulate children's speech. The modified spectra are used to train emphasized channel attention, propagation, and aggregation in time-delay neural network recognizer for children. We compare ChildAugment against various state-of-the-art data augmentation techniques for children's ASV. We also extensively compare different scoring methods, including cosine scoring, probabilistic linear discriminant analysis (PLDA), and neural PLDA. We also propose a low-complexity weighted cosine score for extremely low-resource children ASV. Our findings on the CSLU kids corpus indicate that ChildAugment holds promise as a simple, acoustics-motivated approach, for improving state-of-the-art deep learning based ASV for children. We achieve up to 12.45% (boys) and 11.96% (girls) relative improvement over the baseline. For reproducibility, we provide the evaluation protocols and codes here.
ChildAugment:零资源儿童口语验证的数据扩增方法
现代说话人自动识别系统(ASV)如果完全以成人数据为基础进行训练,其准确性在应用于儿童语音时就会大幅下降。儿童语音库的匮乏阻碍了针对儿童语音对自动语音验证系统进行微调。因此,当务之急是探索更有效的重用成人语音数据的方法。一种很有前景的方法是通过儿童特定的数据增强(这里称为 ChildAugment)来调整成人和儿童的声带参数。具体来说,我们修改成人语音的声母频率和声母带宽,以模拟儿童语音。修改后的频谱用于训练儿童时延神经网络识别器中的强调通道注意、传播和聚合。我们将 ChildAugment 与各种最先进的儿童 ASV 数据增强技术进行了比较。我们还广泛比较了不同的评分方法,包括余弦评分、概率线性判别分析(PLDA)和神经 PLDA。我们还为资源极其匮乏的儿童 ASV 提出了一种低复杂度加权余弦评分法。我们在 CSLU 儿童语料库上的研究结果表明,ChildAugment 是一种以声学为动机的简单方法,有望改善最先进的基于深度学习的儿童 ASV。与基线相比,我们实现了高达 12.45%(男孩)和 11.96%(女孩)的相对改进。为便于重现,我们在此提供了评估协议和代码。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信