{"title":"Speaker Identification System Using CNN Approach","authors":"Neelam Nehra, P. Sangwan, Divya Kumar","doi":"10.1109/ICIERA53202.2021.9726767","DOIUrl":null,"url":null,"abstract":"In this paper, a text independent Speaker Identification (SI) system is proposed using convolutional neural network (CNN). Also, the proposed methodology is tested in a noisy environment. The text independent SI system is used in this work due to its ability to learn features for classification task. Spectrogram images are used at front end for feature extraction. For classification, convolutional neural network is utilized indicating promising results. Dataset used in this work comprise of 5 speakers, with each speaker uttering 4 voice samples. The overall accuracy achieved for the proposed approach is 96.54 %.","PeriodicalId":220461,"journal":{"name":"2021 International Conference on Industrial Electronics Research and Applications (ICIERA)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Industrial Electronics Research and Applications (ICIERA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIERA53202.2021.9726767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, a text independent Speaker Identification (SI) system is proposed using convolutional neural network (CNN). Also, the proposed methodology is tested in a noisy environment. The text independent SI system is used in this work due to its ability to learn features for classification task. Spectrogram images are used at front end for feature extraction. For classification, convolutional neural network is utilized indicating promising results. Dataset used in this work comprise of 5 speakers, with each speaker uttering 4 voice samples. The overall accuracy achieved for the proposed approach is 96.54 %.