{"title":"Log Gabor Wavelet and Maximum a Posteriori Estimator in Speaker Identification","authors":"S. Senapati, S. Chakroborty, G. Saha","doi":"10.1109/INDCON.2006.302757","DOIUrl":null,"url":null,"abstract":"Speaker identification (SI) system needs an efficient feature extraction process and an appropriate speaker model developed from these features. The work introduces the fusion of log Gabor wavelet (LGW) and maximum a posteriori (MAP) estimator for robust text-independent SI system. The focus of this paper is on the robustness to degradations produced by transmission over a telephone channel. Complete experimental framework is conducted on 49 speakers, conversational telephone King-92 SI speech database with two well known speaker models i.e. Gaussian mixture model (GMM) and vector quantization (VQ). Comparisons are made with two different established methods as well as with normal feature extraction procedure to show the robustness of the new approach in different time segments. The GMM attains 98.8% of identification accuracy using 30 second of wide band speech utterances and 87.3% of identification accuracy using 30 second of narrow band speech utterances and is shown to outperform the other methods","PeriodicalId":122715,"journal":{"name":"2006 Annual IEEE India Conference","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 Annual IEEE India Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDCON.2006.302757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Speaker identification (SI) system needs an efficient feature extraction process and an appropriate speaker model developed from these features. The work introduces the fusion of log Gabor wavelet (LGW) and maximum a posteriori (MAP) estimator for robust text-independent SI system. The focus of this paper is on the robustness to degradations produced by transmission over a telephone channel. Complete experimental framework is conducted on 49 speakers, conversational telephone King-92 SI speech database with two well known speaker models i.e. Gaussian mixture model (GMM) and vector quantization (VQ). Comparisons are made with two different established methods as well as with normal feature extraction procedure to show the robustness of the new approach in different time segments. The GMM attains 98.8% of identification accuracy using 30 second of wide band speech utterances and 87.3% of identification accuracy using 30 second of narrow band speech utterances and is shown to outperform the other methods