Speech Command Recognition Using Deep Learning

2021 Sixth International Conference on Advances in Biomedical Engineering (ICABME) Pub Date : 2021-10-07 DOI:10.1109/ICABME53305.2021.9604862

M. Ayache, Hussien Kanaan, Kawthar Kassir, Yasser Kassir

{"title":"Speech Command Recognition Using Deep Learning","authors":"M. Ayache, Hussien Kanaan, Kawthar Kassir, Yasser Kassir","doi":"10.1109/ICABME53305.2021.9604862","DOIUrl":null,"url":null,"abstract":"Speech Recognition Software is a computer program that is trained to take the input of human speech, interpret it, and transcribe it into text. Most recently, the field has benefited from advances in deep learning and big data. The advances are evidenced not only by the surge of academic papers published in the field, but more importantly by the worldwide industry adoption of a variety of deep learning methods in designing and deploying speech recognition systems. The objective of this paper is to propose an advanced and accurate end-user software system that is able to recognize specific commands to control a robot to perform specified tasks in a hospital. This model will be based on Deep Learning since it is effective in models having huge data as for the two versions of Google TensorFlow and AIY datasets used in our model. Convolutional neural network will be used since it is able to extract features from the dataset instead of traditional methods of feature extraction, thus saving training time and reducing the complexity of the system. With addition to that, NVIDIA CUDA will be also used to train the model with GPU to decrease the training time. During training, some experiments have been done to see the effect of some parameters on the results of the system, and to make sure that the chosen parameters in our model are the best. The results indicate that the training, validation, and testing accuracies of the proposed approach were high, the training duration reached very low values due to the innovation used (CUDA Toolkit) and the commands were successfully recognized by the model. These results outcome the results of the papers that developed similar work which will be presented in the coming sections.","PeriodicalId":294393,"journal":{"name":"2021 Sixth International Conference on Advances in Biomedical Engineering (ICABME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Sixth International Conference on Advances in Biomedical Engineering (ICABME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICABME53305.2021.9604862","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Speech Recognition Software is a computer program that is trained to take the input of human speech, interpret it, and transcribe it into text. Most recently, the field has benefited from advances in deep learning and big data. The advances are evidenced not only by the surge of academic papers published in the field, but more importantly by the worldwide industry adoption of a variety of deep learning methods in designing and deploying speech recognition systems. The objective of this paper is to propose an advanced and accurate end-user software system that is able to recognize specific commands to control a robot to perform specified tasks in a hospital. This model will be based on Deep Learning since it is effective in models having huge data as for the two versions of Google TensorFlow and AIY datasets used in our model. Convolutional neural network will be used since it is able to extract features from the dataset instead of traditional methods of feature extraction, thus saving training time and reducing the complexity of the system. With addition to that, NVIDIA CUDA will be also used to train the model with GPU to decrease the training time. During training, some experiments have been done to see the effect of some parameters on the results of the system, and to make sure that the chosen parameters in our model are the best. The results indicate that the training, validation, and testing accuracies of the proposed approach were high, the training duration reached very low values due to the innovation used (CUDA Toolkit) and the commands were successfully recognized by the model. These results outcome the results of the papers that developed similar work which will be presented in the coming sections.

查看原文本刊更多论文

使用深度学习的语音命令识别

语音识别软件是一种计算机程序，它被训练来接受人类语音的输入，解释它，并将其转录成文本。最近，该领域受益于深度学习和大数据的进步。这些进步不仅体现在该领域发表的学术论文的激增上，更重要的是，世界范围内的行业在设计和部署语音识别系统时采用了各种深度学习方法。本文的目的是提出一种先进而准确的最终用户软件系统，该系统能够识别特定命令来控制医院中的机器人执行特定任务。这个模型将基于深度学习，因为对于我们模型中使用的两个版本的Google TensorFlow和AIY数据集来说，深度学习在拥有大量数据的模型中是有效的。将使用卷积神经网络，因为它能够从数据集中提取特征，而不是传统的特征提取方法，从而节省了训练时间，降低了系统的复杂性。除此之外，NVIDIA CUDA还将使用GPU来训练模型，以减少训练时间。在训练过程中，我们做了一些实验来观察一些参数对系统结果的影响，以确保我们的模型中选择的参数是最好的。结果表明，该方法的训练、验证和测试精度较高，训练持续时间较短，并且模型能够成功识别命令。这些结果产生了将在接下来的部分中提出的开发类似工作的论文的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 Sixth International Conference on Advances in Biomedical Engineering (ICABME)

自引率

0.00%

发文量