ODESSA at Albayzin Speaker Diarization Challenge 2018

Jose Patino, H. Delgado, Ruiqing Yin, H. Bredin, C. Barras, N. Evans
{"title":"ODESSA at Albayzin Speaker Diarization Challenge 2018","authors":"Jose Patino, H. Delgado, Ruiqing Yin, H. Bredin, C. Barras, N. Evans","doi":"10.21437/IBERSPEECH.2018-43","DOIUrl":null,"url":null,"abstract":"This paper describes the ODESSA submissions to the Albayzin Speaker Diarization Challenge 2018. The challenge addresses the diarization of TV shows. This work explores three different techniques to represent speech segments, namely binary key, x-vector and triplet-loss based embeddings. While training-free methods such as the binary key technique can be applied easily to a scenario where training data is limited, the training of robust neural-embedding extractors is considerably more challenging. However, when training data is plentiful (open-set condition), neural embeddings provide more robust segmentations, giving speaker representations which lead to better diarization performance. The paper also reports our efforts to improve speaker diarization performance through system combination. For systems with a common temporal resolution, fusion is performed at segment level during clustering. When the systems under fusion produce segmentations with an arbitrary resolution, they are combined at solution level. Both approaches to fusion are shown to improve diarization performance.","PeriodicalId":115963,"journal":{"name":"IberSPEECH Conference","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IberSPEECH Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/IBERSPEECH.2018-43","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

This paper describes the ODESSA submissions to the Albayzin Speaker Diarization Challenge 2018. The challenge addresses the diarization of TV shows. This work explores three different techniques to represent speech segments, namely binary key, x-vector and triplet-loss based embeddings. While training-free methods such as the binary key technique can be applied easily to a scenario where training data is limited, the training of robust neural-embedding extractors is considerably more challenging. However, when training data is plentiful (open-set condition), neural embeddings provide more robust segmentations, giving speaker representations which lead to better diarization performance. The paper also reports our efforts to improve speaker diarization performance through system combination. For systems with a common temporal resolution, fusion is performed at segment level during clustering. When the systems under fusion produce segmentations with an arbitrary resolution, they are combined at solution level. Both approaches to fusion are shown to improve diarization performance.
敖德萨在2018年阿尔巴津演讲挑战
本文描述了敖德萨提交给2018年阿尔巴津演讲者Diarization挑战赛的作品。这一挑战解决了电视节目的数字化问题。这项工作探讨了三种不同的技术来表示语音片段,即二进制密钥,x向量和基于三重损失的嵌入。虽然无需训练的方法,如二进制密钥技术,可以很容易地应用于训练数据有限的场景,但鲁棒神经嵌入提取器的训练相当具有挑战性。然而,当训练数据丰富(开集条件)时,神经嵌入提供更鲁棒的分割,给出说话人表示,从而获得更好的分割性能。本文还报道了我们通过系统组合来提高扬声器偏振性能的努力。对于具有共同时间分辨率的系统,在聚类过程中在段级进行融合。当融合下的系统产生任意分辨率的分割时,它们在解级上进行组合。这两种融合方法都被证明可以提高双化性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信