Multigrained modeling with pattern specific maximum likelihood transformations for text-independent speaker recognition

IEEE Trans. Speech Audio Process. Pub Date : 2003-02-19 DOI:10.1109/TSA.2003.809121

U. Chaudhari, Jirí Navrátil, Stephane H Maes

{"title":"Multigrained modeling with pattern specific maximum likelihood transformations for text-independent speaker recognition","authors":"U. Chaudhari, Jirí Navrátil, Stephane H Maes","doi":"10.1109/TSA.2003.809121","DOIUrl":null,"url":null,"abstract":"We present a transformation-based, multigrained data modeling technique in the context of text independent speaker recognition, aimed at mitigating difficulties caused by sparse training and test data. Both identification and verification are addressed, where we view the entire population as divided into the target population and its complement, which we refer to as the background population. First, we present our development of maximum likelihood transformation based recognition with diagonally constrained Gaussian mixture models and show its robustness to data scarcity with results on identification. Then for each target and background speaker, a multigrained model is constructed using the transformation based extension as a building block. The training data is labeled with an HMM based phone labeler. We then make use of a graduated phone class structure to train the speaker model at various levels of detail. This structure is a tree with the root node containing all the phones. Subsequent levels partition the phones into increasingly finer grained linguistic classes. This method affords the use of fine detail where possible, i.e., as reflected in the amount of training data distributed to each tree node. We demonstrate the effectiveness of the modeling with verification experiments in matched and mismatched conditions.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"10 1","pages":"61-69"},"PeriodicalIF":0.0000,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Trans. Speech Audio Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSA.2003.809121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 33

Abstract

We present a transformation-based, multigrained data modeling technique in the context of text independent speaker recognition, aimed at mitigating difficulties caused by sparse training and test data. Both identification and verification are addressed, where we view the entire population as divided into the target population and its complement, which we refer to as the background population. First, we present our development of maximum likelihood transformation based recognition with diagonally constrained Gaussian mixture models and show its robustness to data scarcity with results on identification. Then for each target and background speaker, a multigrained model is constructed using the transformation based extension as a building block. The training data is labeled with an HMM based phone labeler. We then make use of a graduated phone class structure to train the speaker model at various levels of detail. This structure is a tree with the root node containing all the phones. Subsequent levels partition the phones into increasingly finer grained linguistic classes. This method affords the use of fine detail where possible, i.e., as reflected in the amount of training data distributed to each tree node. We demonstrate the effectiveness of the modeling with verification experiments in matched and mismatched conditions.

查看原文本刊更多论文

具有模式特定的最大似然变换的多粒度建模，用于文本无关的说话人识别

我们提出了一种基于转换的多粒度数据建模技术，用于文本独立的说话人识别，旨在减轻稀疏训练和测试数据带来的困难。讨论了鉴定和核查问题，我们把整个人口分为目标人口及其补充人口，我们称之为背景人口。首先，我们介绍了基于对角约束高斯混合模型的最大似然变换识别的发展，并通过识别结果证明了其对数据稀缺性的鲁棒性。然后，对于每个目标和背景说话者，使用基于转换的扩展作为构建块构建多粒度模型。使用基于HMM的电话标注器对训练数据进行标注。然后，我们使用一个毕业的电话类结构来训练扬声器模型在不同的细节水平。这个结构是一个包含所有电话的根节点的树。随后的级别将电话划分为越来越细粒度的语言类。这种方法提供了在可能的情况下使用精细的细节，即，正如分布到每个树节点的训练数据量所反映的那样。通过匹配和不匹配条件下的验证实验，验证了该模型的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Trans. Speech Audio Process.

自引率

0.00%

发文量