{"title":"Design and Development of Audio Processing and Speech Recognition Algorithm","authors":"Muhammad Aitessam Ahmed","doi":"10.1109/ICASE54940.2021.9904277","DOIUrl":null,"url":null,"abstract":"Speech recognition is the emerging technology in the field of artificial intelligence, as humans find easier to communicate and express their ideas via speech. Many state-of-the-art speech recognition systems have been designed in recent years after the innovation of GPUs, however, these cannot perform well in real-time on low-power processors. Therefore, this paper shows the development of an intelligent deep learning-based speech processing algorithm that was implemented on a quadcopter for simplifying the process of UAV control. The developed algorithm can also be used for other applications after integration with other systems such as automated data entry in ATMs and vending machines, home/office automation, speech-controlled vehicle navigation, and wheelchair operation. At first raw speech signals were converted to 2D spectrograms and then passed to the Convolutional Neural Network. ImageNet based pre-trained ResNet50 model was fine-tuned for the used audio dataset that required minimal feature and model design. After training using cloud GPU on Kaggle notebook, the model achieved the state of art results with 97.1% training accuracy and 96.45% validation accuracy. Then weights of the model were saved and algorithmic program was coded on python using Keras library backend with Tensorflow and an optimized algorithm was implemented on Jetson Nano for real-time transmission on the quadcopter. Speech commands were sent to the quadcopter for its real-time flights and it maneuvered successfully in a guided direction.","PeriodicalId":300328,"journal":{"name":"2021 Seventh International Conference on Aerospace Science and Engineering (ICASE)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Seventh International Conference on Aerospace Science and Engineering (ICASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASE54940.2021.9904277","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Speech recognition is the emerging technology in the field of artificial intelligence, as humans find easier to communicate and express their ideas via speech. Many state-of-the-art speech recognition systems have been designed in recent years after the innovation of GPUs, however, these cannot perform well in real-time on low-power processors. Therefore, this paper shows the development of an intelligent deep learning-based speech processing algorithm that was implemented on a quadcopter for simplifying the process of UAV control. The developed algorithm can also be used for other applications after integration with other systems such as automated data entry in ATMs and vending machines, home/office automation, speech-controlled vehicle navigation, and wheelchair operation. At first raw speech signals were converted to 2D spectrograms and then passed to the Convolutional Neural Network. ImageNet based pre-trained ResNet50 model was fine-tuned for the used audio dataset that required minimal feature and model design. After training using cloud GPU on Kaggle notebook, the model achieved the state of art results with 97.1% training accuracy and 96.45% validation accuracy. Then weights of the model were saved and algorithmic program was coded on python using Keras library backend with Tensorflow and an optimized algorithm was implemented on Jetson Nano for real-time transmission on the quadcopter. Speech commands were sent to the quadcopter for its real-time flights and it maneuvered successfully in a guided direction.