John Andrew Y. Ing, Ronald M. Pascual, Francis D. Dimzon
{"title":"菲律宾儿童语音的混合TDNN-HMM自动语音识别器","authors":"John Andrew Y. Ing, Ronald M. Pascual, Francis D. Dimzon","doi":"10.1109/IICAIET55139.2022.9936815","DOIUrl":null,"url":null,"abstract":"Previous studies presented in the literature in the recent years have shown the feasibility of developing an automatic speech recognition (ASR) system for Filipino-speaking children. However, most of these studies are solely based on the Hidden Markov Model (HMM) with Gaussian Mixture Model (GMM). In this paper, we present the development of a hybrid ASR system using both HMM and Time Delay Neural Network (TDNN). The Filipino Children's Speech Corpus (FCSC), which is purely composed of read speech, was used to train and test all the models. We performed several sets of experiments on various phoneme sets, various numbers of HMM states, and various enhanced models that employed vocal tract length normalization (VTLN), linear discriminant analysis (LDA), and speaker adaptive training (SAT). Our experiments show that a basic TDNN-HMM model could consistently outperform an HMM-GMM model regardless of how many HMM states are present. We also present that VTLN slightly enhances the performance of the model. The best performing model is the 4-state TDNN-HMM hybrid that obtained the lowest word error rate (WER) of 0.97%.","PeriodicalId":142482,"journal":{"name":"2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Hybrid TDNN-HMM Automatic Speech Recognizer for Filipino Children's Speech\",\"authors\":\"John Andrew Y. Ing, Ronald M. Pascual, Francis D. Dimzon\",\"doi\":\"10.1109/IICAIET55139.2022.9936815\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Previous studies presented in the literature in the recent years have shown the feasibility of developing an automatic speech recognition (ASR) system for Filipino-speaking children. However, most of these studies are solely based on the Hidden Markov Model (HMM) with Gaussian Mixture Model (GMM). In this paper, we present the development of a hybrid ASR system using both HMM and Time Delay Neural Network (TDNN). The Filipino Children's Speech Corpus (FCSC), which is purely composed of read speech, was used to train and test all the models. We performed several sets of experiments on various phoneme sets, various numbers of HMM states, and various enhanced models that employed vocal tract length normalization (VTLN), linear discriminant analysis (LDA), and speaker adaptive training (SAT). Our experiments show that a basic TDNN-HMM model could consistently outperform an HMM-GMM model regardless of how many HMM states are present. We also present that VTLN slightly enhances the performance of the model. The best performing model is the 4-state TDNN-HMM hybrid that obtained the lowest word error rate (WER) of 0.97%.\",\"PeriodicalId\":142482,\"journal\":{\"name\":\"2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IICAIET55139.2022.9936815\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IICAIET55139.2022.9936815","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Hybrid TDNN-HMM Automatic Speech Recognizer for Filipino Children's Speech
Previous studies presented in the literature in the recent years have shown the feasibility of developing an automatic speech recognition (ASR) system for Filipino-speaking children. However, most of these studies are solely based on the Hidden Markov Model (HMM) with Gaussian Mixture Model (GMM). In this paper, we present the development of a hybrid ASR system using both HMM and Time Delay Neural Network (TDNN). The Filipino Children's Speech Corpus (FCSC), which is purely composed of read speech, was used to train and test all the models. We performed several sets of experiments on various phoneme sets, various numbers of HMM states, and various enhanced models that employed vocal tract length normalization (VTLN), linear discriminant analysis (LDA), and speaker adaptive training (SAT). Our experiments show that a basic TDNN-HMM model could consistently outperform an HMM-GMM model regardless of how many HMM states are present. We also present that VTLN slightly enhances the performance of the model. The best performing model is the 4-state TDNN-HMM hybrid that obtained the lowest word error rate (WER) of 0.97%.