“Spanish Políglota”: an automatic Speech Recognition system based on HMM

2021 Second International Conference on Information Systems and Software Technologies (ICI2ST) Pub Date : 2021-03-01 DOI:10.1109/ICI2ST51859.2021.00011

Jonathan A. Zea, Josafá Aguiar

{"title":"“Spanish Políglota”: an automatic Speech Recognition system based on HMM","authors":"Jonathan A. Zea, Josafá Aguiar","doi":"10.1109/ICI2ST51859.2021.00011","DOIUrl":null,"url":null,"abstract":"The goal of this ASR system is to be able to recognize audio queries that request static translation of a given Spanish word into a specified language. We call this ASR system as the Spanish Políglota. The pronunciation dictionary for the language model is obtained by applying grapheme to phoneme conversion. It was developed via Festival Speech Synthesis Scheme scripts and the SPPAS Spanish lexicon. The possible audio queries are restricted by a BNF grammar we designed for this project. A triphone acoustic model was generated from a set of 1621 words audio recordings. This acoustic model is based on a N-gram model that estimates its probabilities based on the maximum likelihood estimation MLE. We evaluated the prediction of individual words, as well as of synthetic phrases. We generated 1577 synthetic phrases concatenating the words of our audio set. The performance was also measured over a new set of audio recordings from a different speaker. Evaluation of isolated word recognition achieved 77.91% of correct predictions. Nevertheless, the performance dropped when evaluating the synthetic phrases as well as the second speaker’s speech. We consider it is an initial step towards the development of a fully functional automatic speech recognition system.","PeriodicalId":148844,"journal":{"name":"2021 Second International Conference on Information Systems and Software Technologies (ICI2ST)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Second International Conference on Information Systems and Software Technologies (ICI2ST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICI2ST51859.2021.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The goal of this ASR system is to be able to recognize audio queries that request static translation of a given Spanish word into a specified language. We call this ASR system as the Spanish Políglota. The pronunciation dictionary for the language model is obtained by applying grapheme to phoneme conversion. It was developed via Festival Speech Synthesis Scheme scripts and the SPPAS Spanish lexicon. The possible audio queries are restricted by a BNF grammar we designed for this project. A triphone acoustic model was generated from a set of 1621 words audio recordings. This acoustic model is based on a N-gram model that estimates its probabilities based on the maximum likelihood estimation MLE. We evaluated the prediction of individual words, as well as of synthetic phrases. We generated 1577 synthetic phrases concatenating the words of our audio set. The performance was also measured over a new set of audio recordings from a different speaker. Evaluation of isolated word recognition achieved 77.91% of correct predictions. Nevertheless, the performance dropped when evaluating the synthetic phrases as well as the second speaker’s speech. We consider it is an initial step towards the development of a fully functional automatic speech recognition system.

查看原文本刊更多论文

“西班牙语Políglota”:基于HMM的自动语音识别系统

这个ASR系统的目标是能够识别请求将给定的西班牙语单词静态翻译成指定语言的音频查询。我们称这个ASR系统为西班牙语Políglota。将字素转换为音素，得到语言模型的发音字典。它是通过节日语音合成方案脚本和SPPAS西班牙语词典开发的。可能的音频查询受到我们为这个项目设计的BNF语法的限制。从一组1621个单词的录音中生成了一个三联音声学模型。该声学模型基于N-gram模型，该模型基于最大似然估计MLE估计其概率。我们评估了对单个单词和合成短语的预测。我们生成了1577个合成短语，将音频集的单词连接起来。这种表现也通过一组来自不同扬声器的新录音进行了测量。孤立词识别的评估准确率达到77.91%。然而，在评估合成短语和第二个说话者的演讲时，表现有所下降。我们认为这是迈向全功能自动语音识别系统的第一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 Second International Conference on Information Systems and Software Technologies (ICI2ST)

自引率

0.00%

发文量