{"title":"“Spanish Políglota”: an automatic Speech Recognition system based on HMM","authors":"Jonathan A. Zea, Josafá Aguiar","doi":"10.1109/ICI2ST51859.2021.00011","DOIUrl":null,"url":null,"abstract":"The goal of this ASR system is to be able to recognize audio queries that request static translation of a given Spanish word into a specified language. We call this ASR system as the Spanish Políglota. The pronunciation dictionary for the language model is obtained by applying grapheme to phoneme conversion. It was developed via Festival Speech Synthesis Scheme scripts and the SPPAS Spanish lexicon. The possible audio queries are restricted by a BNF grammar we designed for this project. A triphone acoustic model was generated from a set of 1621 words audio recordings. This acoustic model is based on a N-gram model that estimates its probabilities based on the maximum likelihood estimation MLE. We evaluated the prediction of individual words, as well as of synthetic phrases. We generated 1577 synthetic phrases concatenating the words of our audio set. The performance was also measured over a new set of audio recordings from a different speaker. Evaluation of isolated word recognition achieved 77.91% of correct predictions. Nevertheless, the performance dropped when evaluating the synthetic phrases as well as the second speaker’s speech. We consider it is an initial step towards the development of a fully functional automatic speech recognition system.","PeriodicalId":148844,"journal":{"name":"2021 Second International Conference on Information Systems and Software Technologies (ICI2ST)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Second International Conference on Information Systems and Software Technologies (ICI2ST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICI2ST51859.2021.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The goal of this ASR system is to be able to recognize audio queries that request static translation of a given Spanish word into a specified language. We call this ASR system as the Spanish Políglota. The pronunciation dictionary for the language model is obtained by applying grapheme to phoneme conversion. It was developed via Festival Speech Synthesis Scheme scripts and the SPPAS Spanish lexicon. The possible audio queries are restricted by a BNF grammar we designed for this project. A triphone acoustic model was generated from a set of 1621 words audio recordings. This acoustic model is based on a N-gram model that estimates its probabilities based on the maximum likelihood estimation MLE. We evaluated the prediction of individual words, as well as of synthetic phrases. We generated 1577 synthetic phrases concatenating the words of our audio set. The performance was also measured over a new set of audio recordings from a different speaker. Evaluation of isolated word recognition achieved 77.91% of correct predictions. Nevertheless, the performance dropped when evaluating the synthetic phrases as well as the second speaker’s speech. We consider it is an initial step towards the development of a fully functional automatic speech recognition system.