Assessing costa rican children speech recognition by humans and machines

IF 0.1 Q4 MULTIDISCIPLINARY SCIENCES

Tecnologia en Marcha Pub Date : 2022-11-16 DOI:10.18845/tm.v35i8.6453

Maribel Morales-Rodríguez, Marvin Coto-Jiménez

{"title":"Assessing costa rican children speech recognition by humans and machines","authors":"Maribel Morales-Rodríguez, Marvin Coto-Jiménez","doi":"10.18845/tm.v35i8.6453","DOIUrl":null,"url":null,"abstract":"In recent years, an increasing number of studies on human-computer interaction is taking place, due to the pervasive speech interfaces implemented in systems such as cell phones, personal and home automation assistants. These studies include automatic speech recognition (ASR) and speech synthesis, and are considering a wider variety of conditions of the signals, such as noise and reverberation, and accents and age-related effects as well. For example, one of the key challenges is the development of ASR for children’s speech. Since the current systems have a dependency on language and accents, thus, to improve it, the investigations of speech recognition technologies suitable for children are needed. In this paper, we assess commercial ASR systems for the recognition of Costa Rican children’s speech, for users with ages ranging between three and fourteen years old. To establish a comparison and numeric validation of the ASR systems in recognizing children’s isolated words, we conducted a large subjective listening test that computes the differences and challenges that remains for the state-of-the art ASR systems. The results provide evident numeric differences between ASR systems and human perceptions, especially for younger children. Additionally, we provide suggestions for future research directions in the field.","PeriodicalId":42957,"journal":{"name":"Tecnologia en Marcha","volume":"31 1","pages":""},"PeriodicalIF":0.1000,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tecnologia en Marcha","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18845/tm.v35i8.6453","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, an increasing number of studies on human-computer interaction is taking place, due to the pervasive speech interfaces implemented in systems such as cell phones, personal and home automation assistants. These studies include automatic speech recognition (ASR) and speech synthesis, and are considering a wider variety of conditions of the signals, such as noise and reverberation, and accents and age-related effects as well. For example, one of the key challenges is the development of ASR for children’s speech. Since the current systems have a dependency on language and accents, thus, to improve it, the investigations of speech recognition technologies suitable for children are needed. In this paper, we assess commercial ASR systems for the recognition of Costa Rican children’s speech, for users with ages ranging between three and fourteen years old. To establish a comparison and numeric validation of the ASR systems in recognizing children’s isolated words, we conducted a large subjective listening test that computes the differences and challenges that remains for the state-of-the art ASR systems. The results provide evident numeric differences between ASR systems and human perceptions, especially for younger children. Additionally, we provide suggestions for future research directions in the field.

查看原文本刊更多论文

评估哥斯达黎加儿童的人类和机器语音识别能力

近年来，由于语音接口在手机、个人和家庭自动化助理等系统中的普及，对人机交互的研究越来越多。这些研究包括自动语音识别(ASR)和语音合成，并且正在考虑更广泛的信号条件，如噪音和混响，口音和年龄相关的影响。例如，其中一个关键挑战是儿童语言的ASR发展。由于目前的语音识别系统依赖于语言和口音，因此，为了改进它，需要研究适合儿童的语音识别技术。在本文中，我们评估了用于识别哥斯达黎加儿童语音的商业ASR系统，用户年龄在3到14岁之间。为了建立ASR系统在识别儿童孤立词方面的比较和数字验证，我们进行了一个大型的主观听力测试，计算了目前最先进的ASR系统的差异和挑战。结果提供了明显的ASR系统和人类感知之间的数字差异，特别是对于年幼的儿童。并对该领域未来的研究方向提出了建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Tecnologia en Marcha MULTIDISCIPLINARY SCIENCES-

自引率

0.00%

发文量

审稿时长

28 weeks