{"title":"对数Gabor小波和极大后验估计在说话人识别中的应用","authors":"S. Senapati, S. Chakroborty, G. Saha","doi":"10.1109/INDCON.2006.302757","DOIUrl":null,"url":null,"abstract":"Speaker identification (SI) system needs an efficient feature extraction process and an appropriate speaker model developed from these features. The work introduces the fusion of log Gabor wavelet (LGW) and maximum a posteriori (MAP) estimator for robust text-independent SI system. The focus of this paper is on the robustness to degradations produced by transmission over a telephone channel. Complete experimental framework is conducted on 49 speakers, conversational telephone King-92 SI speech database with two well known speaker models i.e. Gaussian mixture model (GMM) and vector quantization (VQ). Comparisons are made with two different established methods as well as with normal feature extraction procedure to show the robustness of the new approach in different time segments. The GMM attains 98.8% of identification accuracy using 30 second of wide band speech utterances and 87.3% of identification accuracy using 30 second of narrow band speech utterances and is shown to outperform the other methods","PeriodicalId":122715,"journal":{"name":"2006 Annual IEEE India Conference","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Log Gabor Wavelet and Maximum a Posteriori Estimator in Speaker Identification\",\"authors\":\"S. Senapati, S. Chakroborty, G. Saha\",\"doi\":\"10.1109/INDCON.2006.302757\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speaker identification (SI) system needs an efficient feature extraction process and an appropriate speaker model developed from these features. The work introduces the fusion of log Gabor wavelet (LGW) and maximum a posteriori (MAP) estimator for robust text-independent SI system. The focus of this paper is on the robustness to degradations produced by transmission over a telephone channel. Complete experimental framework is conducted on 49 speakers, conversational telephone King-92 SI speech database with two well known speaker models i.e. Gaussian mixture model (GMM) and vector quantization (VQ). Comparisons are made with two different established methods as well as with normal feature extraction procedure to show the robustness of the new approach in different time segments. The GMM attains 98.8% of identification accuracy using 30 second of wide band speech utterances and 87.3% of identification accuracy using 30 second of narrow band speech utterances and is shown to outperform the other methods\",\"PeriodicalId\":122715,\"journal\":{\"name\":\"2006 Annual IEEE India Conference\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 Annual IEEE India Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INDCON.2006.302757\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 Annual IEEE India Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDCON.2006.302757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Log Gabor Wavelet and Maximum a Posteriori Estimator in Speaker Identification
Speaker identification (SI) system needs an efficient feature extraction process and an appropriate speaker model developed from these features. The work introduces the fusion of log Gabor wavelet (LGW) and maximum a posteriori (MAP) estimator for robust text-independent SI system. The focus of this paper is on the robustness to degradations produced by transmission over a telephone channel. Complete experimental framework is conducted on 49 speakers, conversational telephone King-92 SI speech database with two well known speaker models i.e. Gaussian mixture model (GMM) and vector quantization (VQ). Comparisons are made with two different established methods as well as with normal feature extraction procedure to show the robustness of the new approach in different time segments. The GMM attains 98.8% of identification accuracy using 30 second of wide band speech utterances and 87.3% of identification accuracy using 30 second of narrow band speech utterances and is shown to outperform the other methods