{"title":"基于特征/模型空间先验知识插值的未见手机失配补偿鲁棒说话人识别","authors":"Jyh-Her Yang, Y. Liao","doi":"10.1109/CHINSL.2004.1409587","DOIUrl":null,"url":null,"abstract":"The unseen but mismatched handset is the major source of performance degradation for speaker recognition in the telecommunication environment. In this paper, an unseen handset characteristics estimation method based on a priori knowledge interpolation (AKI) is proposed. AKI could be applied in both the feature and model space to interpolate the feature and model transformation functions measured using stochastic matching (SM) and maximum likelihood linear regression (MLLR), respectively. Cross-validation experimental results on the HTIMIT database showed that the average speaker recognition rate could be improved from 59.6%/57.8% to 73.8%/66.8% for seen/unseen handsets. It is therefore a promising method for robust speaker recognition.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Unseen handset mismatch compensation based on feature/model-space a priori knowledge interpolation for robust speaker recognition\",\"authors\":\"Jyh-Her Yang, Y. Liao\",\"doi\":\"10.1109/CHINSL.2004.1409587\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The unseen but mismatched handset is the major source of performance degradation for speaker recognition in the telecommunication environment. In this paper, an unseen handset characteristics estimation method based on a priori knowledge interpolation (AKI) is proposed. AKI could be applied in both the feature and model space to interpolate the feature and model transformation functions measured using stochastic matching (SM) and maximum likelihood linear regression (MLLR), respectively. Cross-validation experimental results on the HTIMIT database showed that the average speaker recognition rate could be improved from 59.6%/57.8% to 73.8%/66.8% for seen/unseen handsets. It is therefore a promising method for robust speaker recognition.\",\"PeriodicalId\":212562,\"journal\":{\"name\":\"2004 International Symposium on Chinese Spoken Language Processing\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2004 International Symposium on Chinese Spoken Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CHINSL.2004.1409587\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CHINSL.2004.1409587","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Unseen handset mismatch compensation based on feature/model-space a priori knowledge interpolation for robust speaker recognition
The unseen but mismatched handset is the major source of performance degradation for speaker recognition in the telecommunication environment. In this paper, an unseen handset characteristics estimation method based on a priori knowledge interpolation (AKI) is proposed. AKI could be applied in both the feature and model space to interpolate the feature and model transformation functions measured using stochastic matching (SM) and maximum likelihood linear regression (MLLR), respectively. Cross-validation experimental results on the HTIMIT database showed that the average speaker recognition rate could be improved from 59.6%/57.8% to 73.8%/66.8% for seen/unseen handsets. It is therefore a promising method for robust speaker recognition.