菲律宾儿童语音的混合TDNN-HMM自动语音识别器

John Andrew Y. Ing, Ronald M. Pascual, Francis D. Dimzon
{"title":"菲律宾儿童语音的混合TDNN-HMM自动语音识别器","authors":"John Andrew Y. Ing, Ronald M. Pascual, Francis D. Dimzon","doi":"10.1109/IICAIET55139.2022.9936815","DOIUrl":null,"url":null,"abstract":"Previous studies presented in the literature in the recent years have shown the feasibility of developing an automatic speech recognition (ASR) system for Filipino-speaking children. However, most of these studies are solely based on the Hidden Markov Model (HMM) with Gaussian Mixture Model (GMM). In this paper, we present the development of a hybrid ASR system using both HMM and Time Delay Neural Network (TDNN). The Filipino Children's Speech Corpus (FCSC), which is purely composed of read speech, was used to train and test all the models. We performed several sets of experiments on various phoneme sets, various numbers of HMM states, and various enhanced models that employed vocal tract length normalization (VTLN), linear discriminant analysis (LDA), and speaker adaptive training (SAT). Our experiments show that a basic TDNN-HMM model could consistently outperform an HMM-GMM model regardless of how many HMM states are present. We also present that VTLN slightly enhances the performance of the model. The best performing model is the 4-state TDNN-HMM hybrid that obtained the lowest word error rate (WER) of 0.97%.","PeriodicalId":142482,"journal":{"name":"2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Hybrid TDNN-HMM Automatic Speech Recognizer for Filipino Children's Speech\",\"authors\":\"John Andrew Y. Ing, Ronald M. Pascual, Francis D. Dimzon\",\"doi\":\"10.1109/IICAIET55139.2022.9936815\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Previous studies presented in the literature in the recent years have shown the feasibility of developing an automatic speech recognition (ASR) system for Filipino-speaking children. However, most of these studies are solely based on the Hidden Markov Model (HMM) with Gaussian Mixture Model (GMM). In this paper, we present the development of a hybrid ASR system using both HMM and Time Delay Neural Network (TDNN). The Filipino Children's Speech Corpus (FCSC), which is purely composed of read speech, was used to train and test all the models. We performed several sets of experiments on various phoneme sets, various numbers of HMM states, and various enhanced models that employed vocal tract length normalization (VTLN), linear discriminant analysis (LDA), and speaker adaptive training (SAT). Our experiments show that a basic TDNN-HMM model could consistently outperform an HMM-GMM model regardless of how many HMM states are present. We also present that VTLN slightly enhances the performance of the model. The best performing model is the 4-state TDNN-HMM hybrid that obtained the lowest word error rate (WER) of 0.97%.\",\"PeriodicalId\":142482,\"journal\":{\"name\":\"2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IICAIET55139.2022.9936815\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IICAIET55139.2022.9936815","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

近年来的文献研究表明,为菲律宾语儿童开发自动语音识别系统是可行的。然而,这些研究大多是基于隐马尔可夫模型(HMM)和高斯混合模型(GMM)。在这篇论文中,我们提出了一种结合HMM和时延神经网络(TDNN)的混合ASR系统的开发。菲律宾儿童语音语料库(FCSC)是一个完全由阅读语音组成的语料库,用于训练和测试所有模型。我们进行了几组实验不同的音素集,不同数量的嗯,以及各种增强模型,采用声道长度归一化(VTLN),线性判别分析(LDA),和议长适应性培训(坐)。我们的实验表明,无论存在多少种HMM状态,基本的TDNN-HMM模型都能始终优于HMM- gmm模型。我们还指出,VTLN略微提高了模型的性能。表现最好的模型是四态TDNN-HMM混合模型,其单词错误率(WER)最低,为0.97%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Hybrid TDNN-HMM Automatic Speech Recognizer for Filipino Children's Speech
Previous studies presented in the literature in the recent years have shown the feasibility of developing an automatic speech recognition (ASR) system for Filipino-speaking children. However, most of these studies are solely based on the Hidden Markov Model (HMM) with Gaussian Mixture Model (GMM). In this paper, we present the development of a hybrid ASR system using both HMM and Time Delay Neural Network (TDNN). The Filipino Children's Speech Corpus (FCSC), which is purely composed of read speech, was used to train and test all the models. We performed several sets of experiments on various phoneme sets, various numbers of HMM states, and various enhanced models that employed vocal tract length normalization (VTLN), linear discriminant analysis (LDA), and speaker adaptive training (SAT). Our experiments show that a basic TDNN-HMM model could consistently outperform an HMM-GMM model regardless of how many HMM states are present. We also present that VTLN slightly enhances the performance of the model. The best performing model is the 4-state TDNN-HMM hybrid that obtained the lowest word error rate (WER) of 0.97%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信