{"title":"基于链高斯混合模型的文本无关说话人识别","authors":"Yanxiang Chen, Ming Liu","doi":"10.1109/ICSDA.2009.5278367","DOIUrl":null,"url":null,"abstract":"Text-independent speaker recognition has better flexibility than text-dependent method. However, due to the phonetic content difference, the text-independent methods usually achieve lower performance than text-dependent method. In order to combining the flexibility of text-independent method and the high performance of text-dependent method, we propose a new modeling technique named a chain of Gaussian Mixture Model which encoding the temporal correlation of the training utterance in the chain structure. A special decoding network is then used to evaluate the test utterance to find the best possible phonetic matched segments between test utterance and training utterance. The experimental results indicate that the proposed method significantly improve the system performance, especially for the short test utterance.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A chain of Gaussian Mixture Model for text-independent speaker recognition\",\"authors\":\"Yanxiang Chen, Ming Liu\",\"doi\":\"10.1109/ICSDA.2009.5278367\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text-independent speaker recognition has better flexibility than text-dependent method. However, due to the phonetic content difference, the text-independent methods usually achieve lower performance than text-dependent method. In order to combining the flexibility of text-independent method and the high performance of text-dependent method, we propose a new modeling technique named a chain of Gaussian Mixture Model which encoding the temporal correlation of the training utterance in the chain structure. A special decoding network is then used to evaluate the test utterance to find the best possible phonetic matched segments between test utterance and training utterance. The experimental results indicate that the proposed method significantly improve the system performance, especially for the short test utterance.\",\"PeriodicalId\":254906,\"journal\":{\"name\":\"2009 Oriental COCOSDA International Conference on Speech Database and Assessments\",\"volume\":\"68 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 Oriental COCOSDA International Conference on Speech Database and Assessments\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSDA.2009.5278367\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSDA.2009.5278367","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A chain of Gaussian Mixture Model for text-independent speaker recognition
Text-independent speaker recognition has better flexibility than text-dependent method. However, due to the phonetic content difference, the text-independent methods usually achieve lower performance than text-dependent method. In order to combining the flexibility of text-independent method and the high performance of text-dependent method, we propose a new modeling technique named a chain of Gaussian Mixture Model which encoding the temporal correlation of the training utterance in the chain structure. A special decoding network is then used to evaluate the test utterance to find the best possible phonetic matched segments between test utterance and training utterance. The experimental results indicate that the proposed method significantly improve the system performance, especially for the short test utterance.