A Speaker Count System for Telephone Conversations

Psj319l Maz, Uchechukwu Ofoegbul, Ananth N Iyerl, Robert E Yantornol, B. Y. Smolenski
{"title":"A Speaker Count System for Telephone Conversations","authors":"Psj319l Maz, Uchechukwu Ofoegbul, Ananth N Iyerl, Robert E Yantornol, B. Y. Smolenski","doi":"10.1109/ISPACS.2006.364899","DOIUrl":null,"url":null,"abstract":"In telephone conversations, only short consecutive utterances can be examined for each speaker, therefore, discriminating between speakers in such conversations is a challenging task which becomes even more challenging when no information about the speakers is known a priori. In this paper, a technique for determining the number of speakers participating in a telephone conversation is presented. This approach assumes no knowledge or information about any of the participating speakers. The technique is based on comparing short utterances within the conversation and deciding whether or not they belong to the same speaker. The applications of this research include three-way call detection and speaker tracking, and could be extended to speaker change-point detection and indexing. The proposed method involves an elimination process in which speech segments matching a chosen set of reference models are sequentially removed from the conversation. Models are formed using the mean vectors and covariance matrices of linear predictive cepstral coefficients of voiced segments in the conversation. The use of the Mahalanobis distance to determine if two models belong to the same or to different speakers, based on likelihood ratio testing, is investigated. The relative amount of residual speech is observed after each elimination process to determine if an additional speaker is present. Experimentation was performed on 4000 artificial conversations from the HTIMIT database. The proposed system was able to yield an average speaker count accuracy of 78%","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 International Symposium on Intelligent Signal Processing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPACS.2006.364899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

In telephone conversations, only short consecutive utterances can be examined for each speaker, therefore, discriminating between speakers in such conversations is a challenging task which becomes even more challenging when no information about the speakers is known a priori. In this paper, a technique for determining the number of speakers participating in a telephone conversation is presented. This approach assumes no knowledge or information about any of the participating speakers. The technique is based on comparing short utterances within the conversation and deciding whether or not they belong to the same speaker. The applications of this research include three-way call detection and speaker tracking, and could be extended to speaker change-point detection and indexing. The proposed method involves an elimination process in which speech segments matching a chosen set of reference models are sequentially removed from the conversation. Models are formed using the mean vectors and covariance matrices of linear predictive cepstral coefficients of voiced segments in the conversation. The use of the Mahalanobis distance to determine if two models belong to the same or to different speakers, based on likelihood ratio testing, is investigated. The relative amount of residual speech is observed after each elimination process to determine if an additional speaker is present. Experimentation was performed on 4000 artificial conversations from the HTIMIT database. The proposed system was able to yield an average speaker count accuracy of 78%
用于电话交谈的说话人计数系统
在电话交谈中,只能检查每个说话者的简短连续话语,因此,在这种谈话中区分说话者是一项具有挑战性的任务,当没有关于说话者的先验信息时,这种任务变得更加具有挑战性。本文提出了一种确定电话会话中说话人数量的方法。这种方法假定不了解或不了解任何参与演讲的人。该技术的基础是比较对话中的短话语,并确定它们是否属于同一说话者。本研究的应用包括三方呼叫检测和说话人跟踪,并可扩展到说话人变化点检测和索引。所提出的方法包括一个消除过程,其中匹配一组选定的参考模型的语音片段依次从对话中删除。利用对话中浊音段线性预测倒谱系数的均值向量和协方差矩阵形成模型。利用马氏距离来确定两个模型是否属于相同或不同的说话者,基于似然比测试,进行了研究。在每次消除处理后观察残余语音的相对量,以确定是否存在额外的说话者。实验在来自HTIMIT数据库的4000个人工会话上进行。该系统的平均说话人计数准确率为78%
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信