{"title":"Comparison of TDNN and Factorized TDNN Approaches for Indonesian Speech Recognition","authors":"Gunarso, A. Buono, Mushthofa, M. T. Uliniansyah","doi":"10.1109/ISITIA59021.2023.10221093","DOIUrl":null,"url":null,"abstract":"The use of Deep Neural Networks in speech recognition development has outperformed the GMM-HMM technique and has been widely applied in various world languages. One of the DNNs traditionally used in speech recognition development is TDNN which has undergone several modifications, such as Factorized TDNN or TDNN-F. In this paper, we will compare the performance of standard TDNN with TDNN-F for developing Indonesian speech recognition. Our experiment used the KDW-BPPT-50K-ASR1 speech corpus developed by BPPT in 2013. We aim to identify which architecture suits Indonesian speech recognition applications better. Using the nnet3 recipe with the chain model in Kaldi, various variations of the TDNN architecture were tested to create an Indonesian speech recognition acoustic model. Furthermore, the acoustic model is compared with the acoustic model produced by TDNN-F. The experimental results show that TDNN-F performs very well compared to vanilla TDNN. The outcomes also indicate that alterations in the vanilla TDNN’s architecture, such as the number of layers and configurations in each layer, do not result in a substantial performance improvement.","PeriodicalId":116682,"journal":{"name":"2023 International Seminar on Intelligent Technology and Its Applications (ISITIA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Seminar on Intelligent Technology and Its Applications (ISITIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISITIA59021.2023.10221093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The use of Deep Neural Networks in speech recognition development has outperformed the GMM-HMM technique and has been widely applied in various world languages. One of the DNNs traditionally used in speech recognition development is TDNN which has undergone several modifications, such as Factorized TDNN or TDNN-F. In this paper, we will compare the performance of standard TDNN with TDNN-F for developing Indonesian speech recognition. Our experiment used the KDW-BPPT-50K-ASR1 speech corpus developed by BPPT in 2013. We aim to identify which architecture suits Indonesian speech recognition applications better. Using the nnet3 recipe with the chain model in Kaldi, various variations of the TDNN architecture were tested to create an Indonesian speech recognition acoustic model. Furthermore, the acoustic model is compared with the acoustic model produced by TDNN-F. The experimental results show that TDNN-F performs very well compared to vanilla TDNN. The outcomes also indicate that alterations in the vanilla TDNN’s architecture, such as the number of layers and configurations in each layer, do not result in a substantial performance improvement.