Lucca Emmanuel Pineli Simões, Lucas Brandão Rodrigues, Rafaela Mota Silva, Gustavo Rodrigues da Silva
{"title":"评估用于无人机控制的语音命令管道:从 STT 和 LLM 到直接分类和连体网络","authors":"Lucca Emmanuel Pineli Simões, Lucas Brandão Rodrigues, Rafaela Mota Silva, Gustavo Rodrigues da Silva","doi":"arxiv-2407.08658","DOIUrl":null,"url":null,"abstract":"This paper presents the development and comparative evaluation of three voice\ncommand pipelines for controlling a Tello drone, using speech recognition and\ndeep learning techniques. The aim is to enhance human-machine interaction by\nenabling intuitive voice control of drone actions. The pipelines developed\ninclude: (1) a traditional Speech-to-Text (STT) followed by a Large Language\nModel (LLM) approach, (2) a direct voice-to-function mapping model, and (3) a\nSiamese neural network-based system. Each pipeline was evaluated based on\ninference time, accuracy, efficiency, and flexibility. Detailed methodologies,\ndataset preparation, and evaluation metrics are provided, offering a\ncomprehensive analysis of each pipeline's strengths and applicability across\ndifferent scenarios.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating Voice Command Pipelines for Drone Control: From STT and LLM to Direct Classification and Siamese Networks\",\"authors\":\"Lucca Emmanuel Pineli Simões, Lucas Brandão Rodrigues, Rafaela Mota Silva, Gustavo Rodrigues da Silva\",\"doi\":\"arxiv-2407.08658\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents the development and comparative evaluation of three voice\\ncommand pipelines for controlling a Tello drone, using speech recognition and\\ndeep learning techniques. The aim is to enhance human-machine interaction by\\nenabling intuitive voice control of drone actions. The pipelines developed\\ninclude: (1) a traditional Speech-to-Text (STT) followed by a Large Language\\nModel (LLM) approach, (2) a direct voice-to-function mapping model, and (3) a\\nSiamese neural network-based system. Each pipeline was evaluated based on\\ninference time, accuracy, efficiency, and flexibility. Detailed methodologies,\\ndataset preparation, and evaluation metrics are provided, offering a\\ncomprehensive analysis of each pipeline's strengths and applicability across\\ndifferent scenarios.\",\"PeriodicalId\":501178,\"journal\":{\"name\":\"arXiv - CS - Sound\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Sound\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.08658\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.08658","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evaluating Voice Command Pipelines for Drone Control: From STT and LLM to Direct Classification and Siamese Networks
This paper presents the development and comparative evaluation of three voice
command pipelines for controlling a Tello drone, using speech recognition and
deep learning techniques. The aim is to enhance human-machine interaction by
enabling intuitive voice control of drone actions. The pipelines developed
include: (1) a traditional Speech-to-Text (STT) followed by a Large Language
Model (LLM) approach, (2) a direct voice-to-function mapping model, and (3) a
Siamese neural network-based system. Each pipeline was evaluated based on
inference time, accuracy, efficiency, and flexibility. Detailed methodologies,
dataset preparation, and evaluation metrics are provided, offering a
comprehensive analysis of each pipeline's strengths and applicability across
different scenarios.