菲律宾儿童语音的混合TDNN-HMM自动语音识别器

2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET) Pub Date : 2022-09-13 DOI:10.1109/IICAIET55139.2022.9936815

John Andrew Y. Ing, Ronald M. Pascual, Francis D. Dimzon

{"title":"菲律宾儿童语音的混合TDNN-HMM自动语音识别器","authors":"John Andrew Y. Ing, Ronald M. Pascual, Francis D. Dimzon","doi":"10.1109/IICAIET55139.2022.9936815","DOIUrl":null,"url":null,"abstract":"Previous studies presented in the literature in the recent years have shown the feasibility of developing an automatic speech recognition (ASR) system for Filipino-speaking children. However, most of these studies are solely based on the Hidden Markov Model (HMM) with Gaussian Mixture Model (GMM). In this paper, we present the development of a hybrid ASR system using both HMM and Time Delay Neural Network (TDNN). The Filipino Children's Speech Corpus (FCSC), which is purely composed of read speech, was used to train and test all the models. We performed several sets of experiments on various phoneme sets, various numbers of HMM states, and various enhanced models that employed vocal tract length normalization (VTLN), linear discriminant analysis (LDA), and speaker adaptive training (SAT). Our experiments show that a basic TDNN-HMM model could consistently outperform an HMM-GMM model regardless of how many HMM states are present. We also present that VTLN slightly enhances the performance of the model. The best performing model is the 4-state TDNN-HMM hybrid that obtained the lowest word error rate (WER) of 0.97%.","PeriodicalId":142482,"journal":{"name":"2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Hybrid TDNN-HMM Automatic Speech Recognizer for Filipino Children's Speech\",\"authors\":\"John Andrew Y. Ing, Ronald M. Pascual, Francis D. Dimzon\",\"doi\":\"10.1109/IICAIET55139.2022.9936815\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Previous studies presented in the literature in the recent years have shown the feasibility of developing an automatic speech recognition (ASR) system for Filipino-speaking children. However, most of these studies are solely based on the Hidden Markov Model (HMM) with Gaussian Mixture Model (GMM). In this paper, we present the development of a hybrid ASR system using both HMM and Time Delay Neural Network (TDNN). The Filipino Children's Speech Corpus (FCSC), which is purely composed of read speech, was used to train and test all the models. We performed several sets of experiments on various phoneme sets, various numbers of HMM states, and various enhanced models that employed vocal tract length normalization (VTLN), linear discriminant analysis (LDA), and speaker adaptive training (SAT). Our experiments show that a basic TDNN-HMM model could consistently outperform an HMM-GMM model regardless of how many HMM states are present. We also present that VTLN slightly enhances the performance of the model. The best performing model is the 4-state TDNN-HMM hybrid that obtained the lowest word error rate (WER) of 0.97%.\",\"PeriodicalId\":142482,\"journal\":{\"name\":\"2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IICAIET55139.2022.9936815\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IICAIET55139.2022.9936815","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

近年来的文献研究表明，为菲律宾语儿童开发自动语音识别系统是可行的。然而，这些研究大多是基于隐马尔可夫模型(HMM)和高斯混合模型(GMM)。在这篇论文中，我们提出了一种结合HMM和时延神经网络(TDNN)的混合ASR系统的开发。菲律宾儿童语音语料库(FCSC)是一个完全由阅读语音组成的语料库，用于训练和测试所有模型。我们进行了几组实验不同的音素集,不同数量的嗯,以及各种增强模型,采用声道长度归一化(VTLN),线性判别分析(LDA),和议长适应性培训(坐)。我们的实验表明，无论存在多少种HMM状态，基本的TDNN-HMM模型都能始终优于HMM- gmm模型。我们还指出，VTLN略微提高了模型的性能。表现最好的模型是四态TDNN-HMM混合模型，其单词错误率(WER)最低，为0.97%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Hybrid TDNN-HMM Automatic Speech Recognizer for Filipino Children's Speech

Previous studies presented in the literature in the recent years have shown the feasibility of developing an automatic speech recognition (ASR) system for Filipino-speaking children. However, most of these studies are solely based on the Hidden Markov Model (HMM) with Gaussian Mixture Model (GMM). In this paper, we present the development of a hybrid ASR system using both HMM and Time Delay Neural Network (TDNN). The Filipino Children's Speech Corpus (FCSC), which is purely composed of read speech, was used to train and test all the models. We performed several sets of experiments on various phoneme sets, various numbers of HMM states, and various enhanced models that employed vocal tract length normalization (VTLN), linear discriminant analysis (LDA), and speaker adaptive training (SAT). Our experiments show that a basic TDNN-HMM model could consistently outperform an HMM-GMM model regardless of how many HMM states are present. We also present that VTLN slightly enhances the performance of the model. The best performing model is the 4-state TDNN-HMM hybrid that obtained the lowest word error rate (WER) of 0.97%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)

自引率

0.00%

发文量