Afan Oromo Speech-Based Computer Command and Control: An Evaluation with Selected Commands

IF 2.3 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Advances in Human-Computer Interaction Pub Date : 2023-10-16 DOI:10.1155/2023/9959015

Kebede Teshite, Getachew Mamo, Kris Calpotura

{"title":"Afan Oromo Speech-Based Computer Command and Control: An Evaluation with Selected Commands","authors":"Kebede Teshite, Getachew Mamo, Kris Calpotura","doi":"10.1155/2023/9959015","DOIUrl":null,"url":null,"abstract":"Speech-based computer command and control utilize natural speech to enable computers to understand human language and execute tasks through commands. However, there has been no study or development of a speech-based command and control system for Microsoft Word in Afan Oromo. The primary aim of this research is to investigate and develop a speech-based command and control system for Afan Oromo using a selected set of command-and-control words from MS Word. To accomplish this objective, a speech recognizer was developed using the HTK toolkit, employing a small vocabulary, isolated words, speaker independence, and HMM-based techniques. The translation of the selected MS command words from English to Afan Oromo was completed in order to develop this automatic speech-based computer command system. Audio recordings were obtained from 38 speakers (16 females and 22 males) aged between 18 and 40 years, based on their availability. Word-level speech recognition was performed using MFCC and data processing, which are widely used and are effective approaches in speech recognition. Out of a total of 64 MS command words, 54 words (84.37%) were used for training and 10 words (15.63%) were used for testing. Live and nonlive evaluation techniques were employed to assess the performance of the recognizer. The live recognizer, which considers variations in the environment, outperformed the nonlive recognizer due to the influence of neighboring phones. The performance results for the monophone tied state, triphone, and triphone-based recognizers were 78.12%, 86.87%, and 88.99%, respectively. Thus, the triphone-based recognizer exhibited the best performance among the nonlive recognizers. The challenges of limited resources in this research study were limited to investigate speech-based commands for computers using only selected MS commands, which play a crucial role in text processing. In order to evaluate a speech-based interface in a real environment, there were no components available for object-as-a-service. The experimental findings of this study demonstrated that if an adequate amount of language resources was available, a computer-based Afan Oromo speech-based interface for command-and-control purposes could be developed.","PeriodicalId":44873,"journal":{"name":"Advances in Human-Computer Interaction","volume":"18 1","pages":"0"},"PeriodicalIF":2.3000,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Human-Computer Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2023/9959015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Speech-based computer command and control utilize natural speech to enable computers to understand human language and execute tasks through commands. However, there has been no study or development of a speech-based command and control system for Microsoft Word in Afan Oromo. The primary aim of this research is to investigate and develop a speech-based command and control system for Afan Oromo using a selected set of command-and-control words from MS Word. To accomplish this objective, a speech recognizer was developed using the HTK toolkit, employing a small vocabulary, isolated words, speaker independence, and HMM-based techniques. The translation of the selected MS command words from English to Afan Oromo was completed in order to develop this automatic speech-based computer command system. Audio recordings were obtained from 38 speakers (16 females and 22 males) aged between 18 and 40 years, based on their availability. Word-level speech recognition was performed using MFCC and data processing, which are widely used and are effective approaches in speech recognition. Out of a total of 64 MS command words, 54 words (84.37%) were used for training and 10 words (15.63%) were used for testing. Live and nonlive evaluation techniques were employed to assess the performance of the recognizer. The live recognizer, which considers variations in the environment, outperformed the nonlive recognizer due to the influence of neighboring phones. The performance results for the monophone tied state, triphone, and triphone-based recognizers were 78.12%, 86.87%, and 88.99%, respectively. Thus, the triphone-based recognizer exhibited the best performance among the nonlive recognizers. The challenges of limited resources in this research study were limited to investigate speech-based commands for computers using only selected MS commands, which play a crucial role in text processing. In order to evaluate a speech-based interface in a real environment, there were no components available for object-as-a-service. The experimental findings of this study demonstrated that if an adequate amount of language resources was available, a computer-based Afan Oromo speech-based interface for command-and-control purposes could be developed.

查看原文本刊更多论文

Afan Oromo基于语音的计算机命令与控制:选择命令的评估

基于语音的计算机命令和控制利用自然语音使计算机能够理解人类语言并通过命令执行任务。然而，目前还没有针对Afan Oromo的Microsoft Word的基于语音的命令和控制系统的研究或开发。本研究的主要目的是使用MS Word中的一组选定的命令和控制词，为Afan Oromo调查和开发一个基于语音的命令和控制系统。为了实现这一目标，使用HTK工具包开发了一个语音识别器，它采用了较小的词汇表、孤立的单词、说话者独立性和基于hmm的技术。为了开发这个基于语音的自动计算机命令系统，将所选的MS命令词从英语翻译成阿凡奥罗莫语。录音来自38位发言者(16位女性和22位男性)，年龄在18至40岁之间，根据他们的可用性。词级语音识别采用了MFCC和数据处理技术，这两种方法在语音识别中应用广泛，是一种有效的方法。在64个MS命令词中，54个词(84.37%)用于训练，10个词(15.63%)用于测试。采用活体和非活体评估技术来评估识别器的性能。考虑环境变化的实时识别器，由于邻近手机的影响，优于非实时识别器。单声道绑定状态识别器、三声道绑定状态识别器和基于三声道的识别器的性能分别为78.12%、86.87%和88.99%。因此，基于三音的识别器在非活体识别器中表现出最好的性能。由于资源有限，本研究仅限于研究仅使用选定的MS命令的基于语音的计算机命令，这些命令在文本处理中起着至关重要的作用。为了在真实环境中评估基于语音的接口，没有可用于对象即服务的组件。这项研究的实验结果表明，如果有足够的语言资源，可以开发一种基于计算机的阿凡奥罗莫语语音界面，用于命令和控制目的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊