基于先验知识的说话人划分与连接识别共同说话人

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI:10.1109/ASRU46091.2019.9003731

Tsun-Yat Leung, Lahiru Samarakoon, Albert Y. S. Lam

{"title":"基于先验知识的说话人划分与连接识别共同说话人","authors":"Tsun-Yat Leung, Lahiru Samarakoon, Albert Y. S. Lam","doi":"10.1109/ASRU46091.2019.9003731","DOIUrl":null,"url":null,"abstract":"Speaker Diarization and Linking discovers “who spoke when” across recordings without any speaker enrollment. Diarization is performed on each recording separately, and the linking combines clusters of the same speaker across recordings. It is a two-step approach, however it suffers from propagating the error from diarization step to the linking step. In a situation where a unique speaker appears in a given set of recordings, this paper aims at locating the common speaker using the prior knowledge of his or her existence. That means there is no enrollment data for this common speaker. We propose Pairwise Common Speaker Identification (PCSI) method that takes the existence of a common speaker into account in contrast to the two-step approach. We further show that PCSI can be used to reduce the errors that are introduced in the diarization step of the two-step approach. Our experiments are performed on a corpus synthesised from the AMI corpus and also on a in-house conversational telephony Sichuanese corpus that is mixed with Mandarin. We show up to 7.68% relative improvements of time-weighted equal error rate over a state-of-art x-vector diarization and linking system.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"108-109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Incorporating Prior Knowledge into Speaker Diarization and Linking for Identifying Common Speaker\",\"authors\":\"Tsun-Yat Leung, Lahiru Samarakoon, Albert Y. S. Lam\",\"doi\":\"10.1109/ASRU46091.2019.9003731\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speaker Diarization and Linking discovers “who spoke when” across recordings without any speaker enrollment. Diarization is performed on each recording separately, and the linking combines clusters of the same speaker across recordings. It is a two-step approach, however it suffers from propagating the error from diarization step to the linking step. In a situation where a unique speaker appears in a given set of recordings, this paper aims at locating the common speaker using the prior knowledge of his or her existence. That means there is no enrollment data for this common speaker. We propose Pairwise Common Speaker Identification (PCSI) method that takes the existence of a common speaker into account in contrast to the two-step approach. We further show that PCSI can be used to reduce the errors that are introduced in the diarization step of the two-step approach. Our experiments are performed on a corpus synthesised from the AMI corpus and also on a in-house conversational telephony Sichuanese corpus that is mixed with Mandarin. We show up to 7.68% relative improvements of time-weighted equal error rate over a state-of-art x-vector diarization and linking system.\",\"PeriodicalId\":150913,\"journal\":{\"name\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"108-109 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU46091.2019.9003731\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9003731","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

扬声器拨号和链接发现“谁说什么时候”在录音中没有任何扬声器登记。在每个录音上分别执行拨号，并且连接将同一扬声器的录音组合在一起。这是一种两步方法，但它的缺点是将误差从初始化步骤传播到链接步骤。在一组给定的录音中出现一个独特的说话人的情况下，本文旨在利用他或她存在的先验知识来定位共同的说话人。这意味着没有这个普通讲话者的注册数据。我们提出了双共同说话人识别(PCSI)方法，该方法考虑了共同说话人的存在，而不是两步方法。我们进一步表明，PCSI可以用来减少在两步方法的双化步骤中引入的误差。我们的实验是在一个由AMI语料库合成的语料库上进行的，也在一个内部电话会话四川语料库上进行的，该语料库与普通话混合。结果表明，与最先进的x向量化和链接系统相比，时间加权等错误率的相对提高高达7.68%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Incorporating Prior Knowledge into Speaker Diarization and Linking for Identifying Common Speaker

Speaker Diarization and Linking discovers “who spoke when” across recordings without any speaker enrollment. Diarization is performed on each recording separately, and the linking combines clusters of the same speaker across recordings. It is a two-step approach, however it suffers from propagating the error from diarization step to the linking step. In a situation where a unique speaker appears in a given set of recordings, this paper aims at locating the common speaker using the prior knowledge of his or her existence. That means there is no enrollment data for this common speaker. We propose Pairwise Common Speaker Identification (PCSI) method that takes the existence of a common speaker into account in contrast to the two-step approach. We further show that PCSI can be used to reduce the errors that are introduced in the diarization step of the two-step approach. Our experiments are performed on a corpus synthesised from the AMI corpus and also on a in-house conversational telephony Sichuanese corpus that is mixed with Mandarin. We show up to 7.68% relative improvements of time-weighted equal error rate over a state-of-art x-vector diarization and linking system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

自引率

0.00%

发文量