Nayyer Aafaq, Mehran Saleem, Jahanzeb Tariq Khan, I. Abbasi
{"title":"Convolutional Neural Networks for Deep Spoken Keyword Spotting","authors":"Nayyer Aafaq, Mehran Saleem, Jahanzeb Tariq Khan, I. Abbasi","doi":"10.1109/ICAI58407.2023.10136648","DOIUrl":null,"url":null,"abstract":"With the increase in biometric security applications, mobile and telephonic communication monitoring and digital assistants, the practical applications of Keyword Spotting (KWS) have increased many folds. The use of Artificial Intelligence in the domain of Keyword Spotting has greatly enhanced its accuracy. In this work, after doing analysis of various feature extraction and Deep Learning techniques, KWS is done both in non-streaming mode and streaming mode. The features of the speech are extracted using Mel-Spectograms and Mel-frequency Cepstral Coefficients (MFCCs). Out of three broad categories of Deep Neural networks, Convolutional Neural Network (CNN) model has been implemented for Keyword Spotting as it out-performs Recurrent Neural Network (RNN) and Feedforward Neural Network (FFNN) due to their lesser complexity and low computational cost. These techniques were used with Google Speech Commands Dataset, provided by Google, online as well as offline.","PeriodicalId":161809,"journal":{"name":"2023 3rd International Conference on Artificial Intelligence (ICAI)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 3rd International Conference on Artificial Intelligence (ICAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAI58407.2023.10136648","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the increase in biometric security applications, mobile and telephonic communication monitoring and digital assistants, the practical applications of Keyword Spotting (KWS) have increased many folds. The use of Artificial Intelligence in the domain of Keyword Spotting has greatly enhanced its accuracy. In this work, after doing analysis of various feature extraction and Deep Learning techniques, KWS is done both in non-streaming mode and streaming mode. The features of the speech are extracted using Mel-Spectograms and Mel-frequency Cepstral Coefficients (MFCCs). Out of three broad categories of Deep Neural networks, Convolutional Neural Network (CNN) model has been implemented for Keyword Spotting as it out-performs Recurrent Neural Network (RNN) and Feedforward Neural Network (FFNN) due to their lesser complexity and low computational cost. These techniques were used with Google Speech Commands Dataset, provided by Google, online as well as offline.