Lucca Emmanuel Pineli Simões, Lucas Brandão Rodrigues, Rafaela Mota Silva, Gustavo Rodrigues da Silva
{"title":"Evaluating Voice Command Pipelines for Drone Control: From STT and LLM to Direct Classification and Siamese Networks","authors":"Lucca Emmanuel Pineli Simões, Lucas Brandão Rodrigues, Rafaela Mota Silva, Gustavo Rodrigues da Silva","doi":"arxiv-2407.08658","DOIUrl":null,"url":null,"abstract":"This paper presents the development and comparative evaluation of three voice\ncommand pipelines for controlling a Tello drone, using speech recognition and\ndeep learning techniques. The aim is to enhance human-machine interaction by\nenabling intuitive voice control of drone actions. The pipelines developed\ninclude: (1) a traditional Speech-to-Text (STT) followed by a Large Language\nModel (LLM) approach, (2) a direct voice-to-function mapping model, and (3) a\nSiamese neural network-based system. Each pipeline was evaluated based on\ninference time, accuracy, efficiency, and flexibility. Detailed methodologies,\ndataset preparation, and evaluation metrics are provided, offering a\ncomprehensive analysis of each pipeline's strengths and applicability across\ndifferent scenarios.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.08658","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents the development and comparative evaluation of three voice
command pipelines for controlling a Tello drone, using speech recognition and
deep learning techniques. The aim is to enhance human-machine interaction by
enabling intuitive voice control of drone actions. The pipelines developed
include: (1) a traditional Speech-to-Text (STT) followed by a Large Language
Model (LLM) approach, (2) a direct voice-to-function mapping model, and (3) a
Siamese neural network-based system. Each pipeline was evaluated based on
inference time, accuracy, efficiency, and flexibility. Detailed methodologies,
dataset preparation, and evaluation metrics are provided, offering a
comprehensive analysis of each pipeline's strengths and applicability across
different scenarios.