A Speaker Count System for Telephone Conversations

2006 International Symposium on Intelligent Signal Processing and Communications Pub Date : 2006-12-01 DOI:10.1109/ISPACS.2006.364899

Psj319l Maz, Uchechukwu Ofoegbul, Ananth N Iyerl, Robert E Yantornol, B. Y. Smolenski

{"title":"A Speaker Count System for Telephone Conversations","authors":"Psj319l Maz, Uchechukwu Ofoegbul, Ananth N Iyerl, Robert E Yantornol, B. Y. Smolenski","doi":"10.1109/ISPACS.2006.364899","DOIUrl":null,"url":null,"abstract":"In telephone conversations, only short consecutive utterances can be examined for each speaker, therefore, discriminating between speakers in such conversations is a challenging task which becomes even more challenging when no information about the speakers is known a priori. In this paper, a technique for determining the number of speakers participating in a telephone conversation is presented. This approach assumes no knowledge or information about any of the participating speakers. The technique is based on comparing short utterances within the conversation and deciding whether or not they belong to the same speaker. The applications of this research include three-way call detection and speaker tracking, and could be extended to speaker change-point detection and indexing. The proposed method involves an elimination process in which speech segments matching a chosen set of reference models are sequentially removed from the conversation. Models are formed using the mean vectors and covariance matrices of linear predictive cepstral coefficients of voiced segments in the conversation. The use of the Mahalanobis distance to determine if two models belong to the same or to different speakers, based on likelihood ratio testing, is investigated. The relative amount of residual speech is observed after each elimination process to determine if an additional speaker is present. Experimentation was performed on 4000 artificial conversations from the HTIMIT database. The proposed system was able to yield an average speaker count accuracy of 78%","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 International Symposium on Intelligent Signal Processing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPACS.2006.364899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

In telephone conversations, only short consecutive utterances can be examined for each speaker, therefore, discriminating between speakers in such conversations is a challenging task which becomes even more challenging when no information about the speakers is known a priori. In this paper, a technique for determining the number of speakers participating in a telephone conversation is presented. This approach assumes no knowledge or information about any of the participating speakers. The technique is based on comparing short utterances within the conversation and deciding whether or not they belong to the same speaker. The applications of this research include three-way call detection and speaker tracking, and could be extended to speaker change-point detection and indexing. The proposed method involves an elimination process in which speech segments matching a chosen set of reference models are sequentially removed from the conversation. Models are formed using the mean vectors and covariance matrices of linear predictive cepstral coefficients of voiced segments in the conversation. The use of the Mahalanobis distance to determine if two models belong to the same or to different speakers, based on likelihood ratio testing, is investigated. The relative amount of residual speech is observed after each elimination process to determine if an additional speaker is present. Experimentation was performed on 4000 artificial conversations from the HTIMIT database. The proposed system was able to yield an average speaker count accuracy of 78%

查看原文本刊更多论文

用于电话交谈的说话人计数系统

在电话交谈中，只能检查每个说话者的简短连续话语，因此，在这种谈话中区分说话者是一项具有挑战性的任务，当没有关于说话者的先验信息时，这种任务变得更加具有挑战性。本文提出了一种确定电话会话中说话人数量的方法。这种方法假定不了解或不了解任何参与演讲的人。该技术的基础是比较对话中的短话语，并确定它们是否属于同一说话者。本研究的应用包括三方呼叫检测和说话人跟踪，并可扩展到说话人变化点检测和索引。所提出的方法包括一个消除过程，其中匹配一组选定的参考模型的语音片段依次从对话中删除。利用对话中浊音段线性预测倒谱系数的均值向量和协方差矩阵形成模型。利用马氏距离来确定两个模型是否属于相同或不同的说话者，基于似然比测试，进行了研究。在每次消除处理后观察残余语音的相对量，以确定是否存在额外的说话者。实验在来自HTIMIT数据库的4000个人工会话上进行。该系统的平均说话人计数准确率为78%

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2006 International Symposium on Intelligent Signal Processing and Communications

自引率

0.00%

发文量