基于语音谱图的模型自适应说话人识别

Proceedings of the IEEE SoutheastCon 2000. 'Preparing for The New Millennium' (Cat. No.00CH37105) Pub Date : 2000-04-07 DOI:10.1109/SECON.2000.845443

S. Gurbuz, J. Gowdy, Z. Tufekci

{"title":"基于语音谱图的模型自适应说话人识别","authors":"S. Gurbuz, J. Gowdy, Z. Tufekci","doi":"10.1109/SECON.2000.845443","DOIUrl":null,"url":null,"abstract":"Speech signal feature extraction is a challenging research area with great significance to the speaker identification and speech recognition communities. We propose a novel speech spectrogram based spectral modal adaptation algorithm. This system is based on dynamic thresholding of speech spectrograms for text-dependent speaker identification. For a given utterance from a target speaker we aim to find the target speaker among a number of speakers who exist in the system. Conceptually, this algorithm attempts to increase the spectral similarity for the target speaker while increasing the spectral dissimilarity for the non-target speaker who is a member of the enrolment set. Therefore, it removes aging and intersession-dependent spectral variation in the utterance while preserving the speaker inherent spectral features. The hidden Markov model (HMM) parameters representing each listed speaker in the system are adapted for each identification event. The results obtained using speech signals from both the Noisex database and from recordings in the laboratory environment seem promising and demonstrate the robustness of the algorithm for aging and session-dependent utterances. Additionally, we have evaluated the adapted and the non-adapted models with data recorded two months after the initial enrollment. The adaptation seems to improve the performance of the system for the aged data from 84% to 91%.","PeriodicalId":206022,"journal":{"name":"Proceedings of the IEEE SoutheastCon 2000. 'Preparing for The New Millennium' (Cat. No.00CH37105)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Speech spectrogram based model adaptation for speaker identification\",\"authors\":\"S. Gurbuz, J. Gowdy, Z. Tufekci\",\"doi\":\"10.1109/SECON.2000.845443\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech signal feature extraction is a challenging research area with great significance to the speaker identification and speech recognition communities. We propose a novel speech spectrogram based spectral modal adaptation algorithm. This system is based on dynamic thresholding of speech spectrograms for text-dependent speaker identification. For a given utterance from a target speaker we aim to find the target speaker among a number of speakers who exist in the system. Conceptually, this algorithm attempts to increase the spectral similarity for the target speaker while increasing the spectral dissimilarity for the non-target speaker who is a member of the enrolment set. Therefore, it removes aging and intersession-dependent spectral variation in the utterance while preserving the speaker inherent spectral features. The hidden Markov model (HMM) parameters representing each listed speaker in the system are adapted for each identification event. The results obtained using speech signals from both the Noisex database and from recordings in the laboratory environment seem promising and demonstrate the robustness of the algorithm for aging and session-dependent utterances. Additionally, we have evaluated the adapted and the non-adapted models with data recorded two months after the initial enrollment. The adaptation seems to improve the performance of the system for the aged data from 84% to 91%.\",\"PeriodicalId\":206022,\"journal\":{\"name\":\"Proceedings of the IEEE SoutheastCon 2000. 'Preparing for The New Millennium' (Cat. No.00CH37105)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the IEEE SoutheastCon 2000. 'Preparing for The New Millennium' (Cat. No.00CH37105)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SECON.2000.845443\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE SoutheastCon 2000. 'Preparing for The New Millennium' (Cat. No.00CH37105)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SECON.2000.845443","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

语音信号特征提取是一个具有挑战性的研究领域，对说话人识别和语音识别领域具有重要意义。提出了一种新的基于语音谱图的频谱模态自适应算法。该系统基于语音谱的动态阈值法进行文本相关说话人识别。对于来自目标说话人的给定话语，我们的目标是从系统中存在的许多说话人中找到目标说话人。从概念上讲，该算法试图增加目标说话人的频谱相似度，同时增加非目标说话人(作为注册集的成员)的频谱不相似度。因此，它在保留说话人固有的频谱特征的同时，消除了话语中老化和会话间相关的频谱变化。隐马尔可夫模型(HMM)参数表示系统中列出的每个说话人，并针对每个识别事件进行调整。使用Noisex数据库和实验室环境中录音的语音信号获得的结果似乎很有希望，并证明了该算法对老化和会话相关话语的鲁棒性。此外，我们用初始登记后两个月记录的数据评估了适应模型和非适应模型。该适应性似乎将系统对老化数据的性能从84%提高到91%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Speech spectrogram based model adaptation for speaker identification

Speech signal feature extraction is a challenging research area with great significance to the speaker identification and speech recognition communities. We propose a novel speech spectrogram based spectral modal adaptation algorithm. This system is based on dynamic thresholding of speech spectrograms for text-dependent speaker identification. For a given utterance from a target speaker we aim to find the target speaker among a number of speakers who exist in the system. Conceptually, this algorithm attempts to increase the spectral similarity for the target speaker while increasing the spectral dissimilarity for the non-target speaker who is a member of the enrolment set. Therefore, it removes aging and intersession-dependent spectral variation in the utterance while preserving the speaker inherent spectral features. The hidden Markov model (HMM) parameters representing each listed speaker in the system are adapted for each identification event. The results obtained using speech signals from both the Noisex database and from recordings in the laboratory environment seem promising and demonstrate the robustness of the algorithm for aging and session-dependent utterances. Additionally, we have evaluated the adapted and the non-adapted models with data recorded two months after the initial enrollment. The adaptation seems to improve the performance of the system for the aged data from 84% to 91%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the IEEE SoutheastCon 2000. 'Preparing for The New Millennium' (Cat. No.00CH37105)

自引率

0.00%

发文量