{"title":"Real-time translation of discrete Sinhala speech to Unicode text","authors":"M. K. H. Gunasekara, R. Meegama","doi":"10.1109/ICTER.2015.7377680","DOIUrl":null,"url":null,"abstract":"This paper presents a methodology to translate discrete Sinhala speech to Sinhala Unicode text in real time. Initially, the Hidden Markov Model and the associated Hidden Markov Toolkit (HTK) is used as the speech recognizer. While real time decoding is obtained by the Julius decoder a three-states Bakis HMM topology is used to build the acoustic model. The normalized Mel frequency cepstral coefficients with zeroth coefficient as the feature vector is used to recognize speech. Although a single person is used during the training session, an average accuracy of 85% is obtained for both speaker dependent and speaker independent speech recognition. Performance evaluation shows the capabilities of the proposed system to convert discrete Sinhala speech to Sinhala Unicode in both quiet and noisy environments.","PeriodicalId":142561,"journal":{"name":"2015 Fifteenth International Conference on Advances in ICT for Emerging Regions (ICTer)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Fifteenth International Conference on Advances in ICT for Emerging Regions (ICTer)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTER.2015.7377680","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
This paper presents a methodology to translate discrete Sinhala speech to Sinhala Unicode text in real time. Initially, the Hidden Markov Model and the associated Hidden Markov Toolkit (HTK) is used as the speech recognizer. While real time decoding is obtained by the Julius decoder a three-states Bakis HMM topology is used to build the acoustic model. The normalized Mel frequency cepstral coefficients with zeroth coefficient as the feature vector is used to recognize speech. Although a single person is used during the training session, an average accuracy of 85% is obtained for both speaker dependent and speaker independent speech recognition. Performance evaluation shows the capabilities of the proposed system to convert discrete Sinhala speech to Sinhala Unicode in both quiet and noisy environments.