Evaluating Voice Command Pipelines for Drone Control: From STT and LLM to Direct Classification and Siamese Networks

arXiv - CS - Sound Pub Date : 2024-07-10 DOI:arxiv-2407.08658

Lucca Emmanuel Pineli Simões, Lucas Brandão Rodrigues, Rafaela Mota Silva, Gustavo Rodrigues da Silva

引用次数: 0

Abstract

This paper presents the development and comparative evaluation of three voice command pipelines for controlling a Tello drone, using speech recognition and deep learning techniques. The aim is to enhance human-machine interaction by enabling intuitive voice control of drone actions. The pipelines developed include: (1) a traditional Speech-to-Text (STT) followed by a Large Language Model (LLM) approach, (2) a direct voice-to-function mapping model, and (3) a Siamese neural network-based system. Each pipeline was evaluated based on inference time, accuracy, efficiency, and flexibility. Detailed methodologies, dataset preparation, and evaluation metrics are provided, offering a comprehensive analysis of each pipeline's strengths and applicability across different scenarios.

查看原文本刊更多论文

评估用于无人机控制的语音命令管道：从 STT 和 LLM 到直接分类和连体网络

本文介绍了利用语音识别和深度学习技术控制 Tello 无人机的三种语音命令管道的开发和比较评估。其目的是通过直观的语音控制无人机行动来增强人机交互。开发的管道包括(1) 传统的语音到文本（STT），然后是大语言模型（LLM）方法；(2) 直接语音到功能映射模型；(3) 基于暹罗神经网络的系统。每个管道都根据推理时间、准确性、效率和灵活性进行了评估。报告提供了详细的方法、数据集准备和评估指标，对每种管道在不同场景下的优势和适用性进行了全面分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Sound

自引率

0.00%

发文量