Neethu Johnson, Kuruvachan K. George, C. S. Kumar, P. Raj
{"title":"提高说话人识别系统的性能","authors":"Neethu Johnson, Kuruvachan K. George, C. S. Kumar, P. Raj","doi":"10.1109/COMPSC.2014.7032617","DOIUrl":null,"url":null,"abstract":"This paper studies the contribution of different phones in speech data towards improving the performance of text/language independent speaker recognition systems. This work is motivated by the fact that the removal of silence segments from the speech data improves the system performance significantly as it does not contain any speaker-specific information. It is also clear from the literature that not all the phones in the speech data contains equal amount of speaker-specific information in it and the performance of the speaker recognition systems depends on this information. In addition to the silence segments, our work empirically finds 18 other diluent phones that has minimum speaker discrimination capability. We propose to use a preprocessing stage that identifies all non-informative set of phones recursively and removes them along with silence segments. Results show that using phones removed preprocessed data in state-of-the-art i-vector system outperforms the baseline i-vector system. We report absolute improvements of 1%, 1%, 2%, 2% and 1% in EER for test set collected through channels of Digital Voice Recorder, Headset, Mobile Phone 1, Mobile Phone 2 and Tablet PC respectively on IITG-MV database.","PeriodicalId":388270,"journal":{"name":"2014 First International Conference on Computational Systems and Communications (ICCSC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Towards improving the performance of speaker recognition systems\",\"authors\":\"Neethu Johnson, Kuruvachan K. George, C. S. Kumar, P. Raj\",\"doi\":\"10.1109/COMPSC.2014.7032617\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper studies the contribution of different phones in speech data towards improving the performance of text/language independent speaker recognition systems. This work is motivated by the fact that the removal of silence segments from the speech data improves the system performance significantly as it does not contain any speaker-specific information. It is also clear from the literature that not all the phones in the speech data contains equal amount of speaker-specific information in it and the performance of the speaker recognition systems depends on this information. In addition to the silence segments, our work empirically finds 18 other diluent phones that has minimum speaker discrimination capability. We propose to use a preprocessing stage that identifies all non-informative set of phones recursively and removes them along with silence segments. Results show that using phones removed preprocessed data in state-of-the-art i-vector system outperforms the baseline i-vector system. We report absolute improvements of 1%, 1%, 2%, 2% and 1% in EER for test set collected through channels of Digital Voice Recorder, Headset, Mobile Phone 1, Mobile Phone 2 and Tablet PC respectively on IITG-MV database.\",\"PeriodicalId\":388270,\"journal\":{\"name\":\"2014 First International Conference on Computational Systems and Communications (ICCSC)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 First International Conference on Computational Systems and Communications (ICCSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMPSC.2014.7032617\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 First International Conference on Computational Systems and Communications (ICCSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSC.2014.7032617","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards improving the performance of speaker recognition systems
This paper studies the contribution of different phones in speech data towards improving the performance of text/language independent speaker recognition systems. This work is motivated by the fact that the removal of silence segments from the speech data improves the system performance significantly as it does not contain any speaker-specific information. It is also clear from the literature that not all the phones in the speech data contains equal amount of speaker-specific information in it and the performance of the speaker recognition systems depends on this information. In addition to the silence segments, our work empirically finds 18 other diluent phones that has minimum speaker discrimination capability. We propose to use a preprocessing stage that identifies all non-informative set of phones recursively and removes them along with silence segments. Results show that using phones removed preprocessed data in state-of-the-art i-vector system outperforms the baseline i-vector system. We report absolute improvements of 1%, 1%, 2%, 2% and 1% in EER for test set collected through channels of Digital Voice Recorder, Headset, Mobile Phone 1, Mobile Phone 2 and Tablet PC respectively on IITG-MV database.