Dataset and Evaluation of Automatic Speech Recognition for Multi-lingual Intent Recognition on Social Robots

IEEE/ACM International Conference on Human-Robot Interaction Pub Date : 2024-03-11 DOI:10.1145/3610977.3637473

Antonio Andriella, Raquel Ros, Yoav Ellinson, Sharon Gannot, S. Lemaignan

引用次数: 0

Abstract

While Automatic Speech Recognition (ASR) systems excel in controlled environments, challenges arise in robot-specific setups due to unique microphone requirements and added noise sources. In this paper, we create a dataset of common robot instructions in 5 European languages, and we systematically evaluate current state-of-art ASR systems (Vosk, OpenWhisper, Google Speech and NVidia Riva). Besides standard metrics, we also look at two critical down-stream tasks for human-robot verbal interaction: intent recognition rate and entity extraction, using the open-source Rasa framework. Overall, we found that open-source solutions as Vosk performs competitively with closed-source solutions while running on the edge, on a low compute budget (CPU only).

查看原文本刊更多论文

用于社交机器人多语言意图识别的自动语音识别数据集与评估

虽然自动语音识别（ASR）系统在受控环境中表现出色，但由于独特的麦克风要求和额外的噪声源，在机器人特定设置中会出现挑战。在本文中，我们创建了一个包含 5 种欧洲语言的常见机器人指令的数据集，并对当前最先进的 ASR 系统（Vosk、OpenWhisper、Google Speech 和 NVidia Riva）进行了系统评估。除标准指标外，我们还使用开源 Rasa 框架考察了人机语言交互的两个关键下游任务：意图识别率和实体提取。总体而言，我们发现开源解决方案（如 Vosk）与封闭源代码解决方案相比具有很强的竞争力，同时还能在边缘运行，计算预算较低（仅 CPU）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE/ACM International Conference on Human-Robot Interaction

自引率

0.00%

发文量