说话人识别系统动态数据库的创建

Advances in Mobile Multimedia Pub Date : 2013-12-02 DOI:10.1145/2536853.2536923

Bhushan D. Patil, Yogesh Manav, Pavan Sudheendra

{"title":"说话人识别系统动态数据库的创建","authors":"Bhushan D. Patil, Yogesh Manav, Pavan Sudheendra","doi":"10.1145/2536853.2536923","DOIUrl":null,"url":null,"abstract":"The classical speaker identification algorithm gives acceptable results if the training is done offline using good quality database [5]. Though there has been a substantial amount of research in speaker recognition area, the majority of work has been focused on the offline training scenario. However in some scenarios where real time speaker recognition is required like in the case of Viewer preference based presentation/playback of media content, offline training is not possible as there is no prior information on the subjects/speakers present in the content. A run time training approach is required to generate a dynamic features database, which can be used to provide features like Viewer preference based seek or Zoom to specific subject/speaker during Media Playback. In this paper we propose a speaker recognition system using a dynamically created database. In this paper we consider Speaker recognition as a classification problem wherein speakers are classified based on speech features. The proposed speaker recognition system uses MFCC (Mel Frequency Cepstral Coefficients) as features and Polynomial/GMM (Gaussian Mixture Model) as classifiers. In our analysis, we demonstrate the pros and cons of the algorithms employing dynamic database creation. The test results show that ~96% accuracy for a content having 5 speakers can be achieved using the proposed system.","PeriodicalId":135195,"journal":{"name":"Advances in Mobile Multimedia","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Dynamic Database Creation for Speaker Recognition System\",\"authors\":\"Bhushan D. Patil, Yogesh Manav, Pavan Sudheendra\",\"doi\":\"10.1145/2536853.2536923\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The classical speaker identification algorithm gives acceptable results if the training is done offline using good quality database [5]. Though there has been a substantial amount of research in speaker recognition area, the majority of work has been focused on the offline training scenario. However in some scenarios where real time speaker recognition is required like in the case of Viewer preference based presentation/playback of media content, offline training is not possible as there is no prior information on the subjects/speakers present in the content. A run time training approach is required to generate a dynamic features database, which can be used to provide features like Viewer preference based seek or Zoom to specific subject/speaker during Media Playback. In this paper we propose a speaker recognition system using a dynamically created database. In this paper we consider Speaker recognition as a classification problem wherein speakers are classified based on speech features. The proposed speaker recognition system uses MFCC (Mel Frequency Cepstral Coefficients) as features and Polynomial/GMM (Gaussian Mixture Model) as classifiers. In our analysis, we demonstrate the pros and cons of the algorithms employing dynamic database creation. The test results show that ~96% accuracy for a content having 5 speakers can be achieved using the proposed system.\",\"PeriodicalId\":135195,\"journal\":{\"name\":\"Advances in Mobile Multimedia\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in Mobile Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2536853.2536923\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Mobile Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2536853.2536923","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

经典的说话人识别算法如果使用高质量的数据库进行离线训练，则可以获得可接受的结果[5]。虽然在说话人识别领域已经有了大量的研究，但大部分工作都集中在离线训练场景上。然而，在一些需要实时说话者识别的场景中，比如在基于观众偏好的媒体内容呈现/播放的情况下，离线培训是不可能的，因为没有关于内容中出现的主题/说话者的先验信息。需要运行时训练方法来生成动态特征数据库，该数据库可用于在媒体播放期间提供基于查看器偏好的seek或缩放到特定主题/说话者等功能。本文提出了一种基于动态数据库的说话人识别系统。本文将说话人识别作为一个分类问题，根据说话人的语音特征对说话人进行分类。提出的说话人识别系统以Mel频率倒谱系数(MFCC)作为特征，以多项式/高斯混合模型(Polynomial/GMM)作为分类器。在我们的分析中，我们演示了采用动态数据库创建的算法的优缺点。测试结果表明，对于有5个说话人的内容，使用该系统可以达到约96%的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dynamic Database Creation for Speaker Recognition System

The classical speaker identification algorithm gives acceptable results if the training is done offline using good quality database [5]. Though there has been a substantial amount of research in speaker recognition area, the majority of work has been focused on the offline training scenario. However in some scenarios where real time speaker recognition is required like in the case of Viewer preference based presentation/playback of media content, offline training is not possible as there is no prior information on the subjects/speakers present in the content. A run time training approach is required to generate a dynamic features database, which can be used to provide features like Viewer preference based seek or Zoom to specific subject/speaker during Media Playback. In this paper we propose a speaker recognition system using a dynamically created database. In this paper we consider Speaker recognition as a classification problem wherein speakers are classified based on speech features. The proposed speaker recognition system uses MFCC (Mel Frequency Cepstral Coefficients) as features and Polynomial/GMM (Gaussian Mixture Model) as classifiers. In our analysis, we demonstrate the pros and cons of the algorithms employing dynamic database creation. The test results show that ~96% accuracy for a content having 5 speakers can be achieved using the proposed system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Advances in Mobile Multimedia

自引率

0.00%

发文量