基于鉴别DPGMM后图的低资源ASR研究

Bin Wu, S. Sakti, Satoshi Nakamura
{"title":"基于鉴别DPGMM后图的低资源ASR研究","authors":"Bin Wu, S. Sakti, Satoshi Nakamura","doi":"10.1109/SLT48900.2021.9383597","DOIUrl":null,"url":null,"abstract":"The first step in building an ASR system is to extract proper speech features. The ideal speech features for ASR must also have high discriminabilities between linguistic units and be robust to such non-linguistic factors as gender, age, emotions, or noise. The discriminabilities of various features have been compared in several Zerospeech challenges to discover linguistic units without any transcriptions, in which the posteriorgrams of DPGMM clustering show strong discriminability and get several top results of ABX discrimination scores between phonemes. This paper appends DPGMM posteriorgrams to increase the discriminability of acoustic features to enhance ASR systems. To the best of our knowledge, DPGMM features, which are usually applied to such tasks as spoken term detection and zero resources tasks, have not been applied to large vocabulary continuous speech recognition (LVCSR) before. DPGMM clustering can dynamically change the number of Gaussians until each one fits one segmental pattern of the whole speech corpus with the highest probability such that the linguistic units of different segmental patterns are clearly discriminated. Our experimental results on the WSJ corpora show our proposal stably improves ASR systems and provides even more improvement for smaller datasets with fewer resources.","PeriodicalId":243211,"journal":{"name":"2021 IEEE Spoken Language Technology Workshop (SLT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Incorporating Discriminative DPGMM Posteriorgrams for Low-Resource ASR\",\"authors\":\"Bin Wu, S. Sakti, Satoshi Nakamura\",\"doi\":\"10.1109/SLT48900.2021.9383597\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The first step in building an ASR system is to extract proper speech features. The ideal speech features for ASR must also have high discriminabilities between linguistic units and be robust to such non-linguistic factors as gender, age, emotions, or noise. The discriminabilities of various features have been compared in several Zerospeech challenges to discover linguistic units without any transcriptions, in which the posteriorgrams of DPGMM clustering show strong discriminability and get several top results of ABX discrimination scores between phonemes. This paper appends DPGMM posteriorgrams to increase the discriminability of acoustic features to enhance ASR systems. To the best of our knowledge, DPGMM features, which are usually applied to such tasks as spoken term detection and zero resources tasks, have not been applied to large vocabulary continuous speech recognition (LVCSR) before. DPGMM clustering can dynamically change the number of Gaussians until each one fits one segmental pattern of the whole speech corpus with the highest probability such that the linguistic units of different segmental patterns are clearly discriminated. Our experimental results on the WSJ corpora show our proposal stably improves ASR systems and provides even more improvement for smaller datasets with fewer resources.\",\"PeriodicalId\":243211,\"journal\":{\"name\":\"2021 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT48900.2021.9383597\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT48900.2021.9383597","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

建立ASR系统的第一步是提取适当的语音特征。ASR的理想语音特征还必须在语言单位之间具有高度的区分能力,并且对性别、年龄、情绪或噪声等非语言因素具有鲁棒性。在多个零语音挑战中比较了各种特征的可判别性,以发现没有任何转录的语言单位,其中DPGMM聚类的后图具有较强的可判别性,并获得了多个音素间ABX判别得分最高的结果。本文通过附加DPGMM后图来提高声学特征的可分辨性,从而增强ASR系统。据我们所知,DPGMM特征通常应用于口语术语检测和零资源任务等任务,但以前还没有应用于大词汇量连续语音识别(LVCSR)。DPGMM聚类可以动态改变高斯数,直到每个高斯数以最高概率拟合整个语音语料库的一个片段模式,从而明确区分不同片段模式的语言单位。我们在WSJ语料库上的实验结果表明,我们的提议稳定地改进了ASR系统,并且在资源更少的小数据集上提供了更多的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Incorporating Discriminative DPGMM Posteriorgrams for Low-Resource ASR
The first step in building an ASR system is to extract proper speech features. The ideal speech features for ASR must also have high discriminabilities between linguistic units and be robust to such non-linguistic factors as gender, age, emotions, or noise. The discriminabilities of various features have been compared in several Zerospeech challenges to discover linguistic units without any transcriptions, in which the posteriorgrams of DPGMM clustering show strong discriminability and get several top results of ABX discrimination scores between phonemes. This paper appends DPGMM posteriorgrams to increase the discriminability of acoustic features to enhance ASR systems. To the best of our knowledge, DPGMM features, which are usually applied to such tasks as spoken term detection and zero resources tasks, have not been applied to large vocabulary continuous speech recognition (LVCSR) before. DPGMM clustering can dynamically change the number of Gaussians until each one fits one segmental pattern of the whole speech corpus with the highest probability such that the linguistic units of different segmental patterns are clearly discriminated. Our experimental results on the WSJ corpora show our proposal stably improves ASR systems and provides even more improvement for smaller datasets with fewer resources.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信