{"title":"Automatic Speaker Recognition using Deep Neural Network Classifiers","authors":"Abdikarim Ali Moumin, Smitha S Kumar","doi":"10.1109/iccakm50778.2021.9357699","DOIUrl":null,"url":null,"abstract":"The advances in modern computing technologies have achieved a breakthrough in the fields of artificial intelligence (AI) and the Internet of Things (IoT). One of the major achievements in the recent history is the ability of the computer software to classify and recognize some of the objects or sounds by learning data. In this paper, we have trained the software to recognize people using their voice utterances using TIMIT Acoustic Phonetic Continuous Speech Corpus. The speaker identity is enrolled by acquiring voice samples of the speaker. Relevant features are extracted, and a model is built using the extracted feature vectors. A pattern matching classification is applied to the model using artificial neural network techniques. Speaker verification system is built using Kaldi libraries to analyze acoustic features, while x-vector training is implemented using Tensor Flow. To achieve better performance, we have implemented a combination of multiple layers of TDNN (Time Delay Neural Networks) and LSTM (Long Short-Term Memory) deep neural networks.","PeriodicalId":165854,"journal":{"name":"2021 2nd International Conference on Computation, Automation and Knowledge Management (ICCAKM)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 2nd International Conference on Computation, Automation and Knowledge Management (ICCAKM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccakm50778.2021.9357699","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The advances in modern computing technologies have achieved a breakthrough in the fields of artificial intelligence (AI) and the Internet of Things (IoT). One of the major achievements in the recent history is the ability of the computer software to classify and recognize some of the objects or sounds by learning data. In this paper, we have trained the software to recognize people using their voice utterances using TIMIT Acoustic Phonetic Continuous Speech Corpus. The speaker identity is enrolled by acquiring voice samples of the speaker. Relevant features are extracted, and a model is built using the extracted feature vectors. A pattern matching classification is applied to the model using artificial neural network techniques. Speaker verification system is built using Kaldi libraries to analyze acoustic features, while x-vector training is implemented using Tensor Flow. To achieve better performance, we have implemented a combination of multiple layers of TDNN (Time Delay Neural Networks) and LSTM (Long Short-Term Memory) deep neural networks.