J. A. Qadir, Abdulbasit K. Al-Talabani, Hiwa A. Aziz
{"title":"Isolated Spoken Word Recognition Using One-Dimensional Convolutional Neural Network","authors":"J. A. Qadir, Abdulbasit K. Al-Talabani, Hiwa A. Aziz","doi":"10.5391/ijfis.2020.20.4.272","DOIUrl":null,"url":null,"abstract":"Isolated uttered word recognition has many applications in human–computer interfaces. Feature extraction in speech represents a vital and challenging step for speech-based classification. In this work, we propose a one-dimensional convolutional neural network (CNN) that extracts learned features and classifies them based on a multilayer perceptron. The proposed models are tested on a designed dataset of 119 speakers uttering Kurdish digits (0–9). The results show that both speaker-dependent (average accuracy of 98.5%) and speaker-independent (average accuracy of 97.3%) models achieve convincing results. The analysis of the results shows that 9 of the speakers have a bias characteristic, and their results are outliers compared to the other 110 speakers.","PeriodicalId":354250,"journal":{"name":"Int. J. Fuzzy Log. Intell. Syst.","volume":"191 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Fuzzy Log. Intell. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5391/ijfis.2020.20.4.272","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Isolated uttered word recognition has many applications in human–computer interfaces. Feature extraction in speech represents a vital and challenging step for speech-based classification. In this work, we propose a one-dimensional convolutional neural network (CNN) that extracts learned features and classifies them based on a multilayer perceptron. The proposed models are tested on a designed dataset of 119 speakers uttering Kurdish digits (0–9). The results show that both speaker-dependent (average accuracy of 98.5%) and speaker-independent (average accuracy of 97.3%) models achieve convincing results. The analysis of the results shows that 9 of the speakers have a bias characteristic, and their results are outliers compared to the other 110 speakers.