M. Ayache, Hussien Kanaan, Kawthar Kassir, Yasser Kassir
{"title":"Speech Command Recognition Using Deep Learning","authors":"M. Ayache, Hussien Kanaan, Kawthar Kassir, Yasser Kassir","doi":"10.1109/ICABME53305.2021.9604862","DOIUrl":null,"url":null,"abstract":"Speech Recognition Software is a computer program that is trained to take the input of human speech, interpret it, and transcribe it into text. Most recently, the field has benefited from advances in deep learning and big data. The advances are evidenced not only by the surge of academic papers published in the field, but more importantly by the worldwide industry adoption of a variety of deep learning methods in designing and deploying speech recognition systems. The objective of this paper is to propose an advanced and accurate end-user software system that is able to recognize specific commands to control a robot to perform specified tasks in a hospital. This model will be based on Deep Learning since it is effective in models having huge data as for the two versions of Google TensorFlow and AIY datasets used in our model. Convolutional neural network will be used since it is able to extract features from the dataset instead of traditional methods of feature extraction, thus saving training time and reducing the complexity of the system. With addition to that, NVIDIA CUDA will be also used to train the model with GPU to decrease the training time. During training, some experiments have been done to see the effect of some parameters on the results of the system, and to make sure that the chosen parameters in our model are the best. The results indicate that the training, validation, and testing accuracies of the proposed approach were high, the training duration reached very low values due to the innovation used (CUDA Toolkit) and the commands were successfully recognized by the model. These results outcome the results of the papers that developed similar work which will be presented in the coming sections.","PeriodicalId":294393,"journal":{"name":"2021 Sixth International Conference on Advances in Biomedical Engineering (ICABME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Sixth International Conference on Advances in Biomedical Engineering (ICABME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICABME53305.2021.9604862","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Speech Recognition Software is a computer program that is trained to take the input of human speech, interpret it, and transcribe it into text. Most recently, the field has benefited from advances in deep learning and big data. The advances are evidenced not only by the surge of academic papers published in the field, but more importantly by the worldwide industry adoption of a variety of deep learning methods in designing and deploying speech recognition systems. The objective of this paper is to propose an advanced and accurate end-user software system that is able to recognize specific commands to control a robot to perform specified tasks in a hospital. This model will be based on Deep Learning since it is effective in models having huge data as for the two versions of Google TensorFlow and AIY datasets used in our model. Convolutional neural network will be used since it is able to extract features from the dataset instead of traditional methods of feature extraction, thus saving training time and reducing the complexity of the system. With addition to that, NVIDIA CUDA will be also used to train the model with GPU to decrease the training time. During training, some experiments have been done to see the effect of some parameters on the results of the system, and to make sure that the chosen parameters in our model are the best. The results indicate that the training, validation, and testing accuracies of the proposed approach were high, the training duration reached very low values due to the innovation used (CUDA Toolkit) and the commands were successfully recognized by the model. These results outcome the results of the papers that developed similar work which will be presented in the coming sections.