{"title":"基于声谱图的深度神经网络声纹识别","authors":"Penghua Li, Minglong Chen, Fangchao Hu, Yang Xu","doi":"10.1109/CCDC.2015.7162425","DOIUrl":null,"url":null,"abstract":"This paper presents a speaker identification algorithm using the deep neural network (DNN) as the classifier to learn the features of the voiceprints represented by spectrogram. The collected speech signals are pre-emphasized, windowed, divided into some chunks, then calculated to obtain the magnitude of the frequency spectrum, which creates the spectrograms. The local binary patterns (LBP) operator is used to obtain the texture features embedded in spectrograms. These texture features, being represented by LBP vectors, are fed to DNN with four hidden layers to learn the speech features. In the learning progress, both of extraction and reconstruction procedures are reduplicated in each hidden layer. Through these extraction and reconstruction procedures of DNN, the speech features of each individual are given as a recognition figure, which offers the recognition results. The numerical experiments indicate that our approach has an acceptable recognition rate with high accuracy.","PeriodicalId":273292,"journal":{"name":"The 27th Chinese Control and Decision Conference (2015 CCDC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"A spectrogram-based voiceprint recognition using deep neural network\",\"authors\":\"Penghua Li, Minglong Chen, Fangchao Hu, Yang Xu\",\"doi\":\"10.1109/CCDC.2015.7162425\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a speaker identification algorithm using the deep neural network (DNN) as the classifier to learn the features of the voiceprints represented by spectrogram. The collected speech signals are pre-emphasized, windowed, divided into some chunks, then calculated to obtain the magnitude of the frequency spectrum, which creates the spectrograms. The local binary patterns (LBP) operator is used to obtain the texture features embedded in spectrograms. These texture features, being represented by LBP vectors, are fed to DNN with four hidden layers to learn the speech features. In the learning progress, both of extraction and reconstruction procedures are reduplicated in each hidden layer. Through these extraction and reconstruction procedures of DNN, the speech features of each individual are given as a recognition figure, which offers the recognition results. The numerical experiments indicate that our approach has an acceptable recognition rate with high accuracy.\",\"PeriodicalId\":273292,\"journal\":{\"name\":\"The 27th Chinese Control and Decision Conference (2015 CCDC)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 27th Chinese Control and Decision Conference (2015 CCDC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCDC.2015.7162425\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 27th Chinese Control and Decision Conference (2015 CCDC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCDC.2015.7162425","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A spectrogram-based voiceprint recognition using deep neural network
This paper presents a speaker identification algorithm using the deep neural network (DNN) as the classifier to learn the features of the voiceprints represented by spectrogram. The collected speech signals are pre-emphasized, windowed, divided into some chunks, then calculated to obtain the magnitude of the frequency spectrum, which creates the spectrograms. The local binary patterns (LBP) operator is used to obtain the texture features embedded in spectrograms. These texture features, being represented by LBP vectors, are fed to DNN with four hidden layers to learn the speech features. In the learning progress, both of extraction and reconstruction procedures are reduplicated in each hidden layer. Through these extraction and reconstruction procedures of DNN, the speech features of each individual are given as a recognition figure, which offers the recognition results. The numerical experiments indicate that our approach has an acceptable recognition rate with high accuracy.