Albayzin 2018 IberSPEECH-RTVE评估中AuDIaS-UAM系统中基于dnn的扬声器化嵌入

IberSPEECH Conference Pub Date : 2018-11-21 DOI:10.21437/IBERSPEECH.2018-46

Alicia Lozano-Diez, Beltran Labrador, Diego de Benito-Gorrón, Pablo Ramirez, D. Toledano

{"title":"Albayzin 2018 IberSPEECH-RTVE评估中AuDIaS-UAM系统中基于dnn的扬声器化嵌入","authors":"Alicia Lozano-Diez, Beltran Labrador, Diego de Benito-Gorrón, Pablo Ramirez, D. Toledano","doi":"10.21437/IBERSPEECH.2018-46","DOIUrl":null,"url":null,"abstract":"This document describes the three systems submitted by the AuDIaS-UAM team for the Albayzin 2018 IberSPEECH-RTVE speaker diarization evaluation. Two of our systems (primary and contrastive 1 submissions) are based on embeddings which are a ﬁxed length representation of a given audio segment obtained from a deep neural network (DNN) trained for speaker classiﬁcation. The third system (contrastive 2) uses the classical i-vector as representation of the audio segments. The resulting embeddings or i-vectors are then grouped using Agglomerative Hierarchical Clustering (AHC) in order to obtain the diarization labels. The new DNN-embedding approach for speaker diarization has obtained a remarkable performance over the Albayzin development dataset, similar to the performance achieved with the well-known i-vector approach.","PeriodicalId":115963,"journal":{"name":"IberSPEECH Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"DNN-based Embeddings for Speaker Diarization in the AuDIaS-UAM System for the Albayzin 2018 IberSPEECH-RTVE Evaluation\",\"authors\":\"Alicia Lozano-Diez, Beltran Labrador, Diego de Benito-Gorrón, Pablo Ramirez, D. Toledano\",\"doi\":\"10.21437/IBERSPEECH.2018-46\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This document describes the three systems submitted by the AuDIaS-UAM team for the Albayzin 2018 IberSPEECH-RTVE speaker diarization evaluation. Two of our systems (primary and contrastive 1 submissions) are based on embeddings which are a ﬁxed length representation of a given audio segment obtained from a deep neural network (DNN) trained for speaker classiﬁcation. The third system (contrastive 2) uses the classical i-vector as representation of the audio segments. The resulting embeddings or i-vectors are then grouped using Agglomerative Hierarchical Clustering (AHC) in order to obtain the diarization labels. The new DNN-embedding approach for speaker diarization has obtained a remarkable performance over the Albayzin development dataset, similar to the performance achieved with the well-known i-vector approach.\",\"PeriodicalId\":115963,\"journal\":{\"name\":\"IberSPEECH Conference\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IberSPEECH Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/IBERSPEECH.2018-46\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IberSPEECH Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/IBERSPEECH.2018-46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

本文档描述了AuDIaS-UAM团队为Albayzin 2018 IberSPEECH-RTVE扬声器化评估提交的三个系统。我们的两个系统(主要和对比1提交)基于嵌入，嵌入是给定音频片段的固定长度表示，这些音频片段来自用于说话人分类的深度神经网络(DNN)。第三个系统(对比2)使用经典的i向量作为音频片段的表示。然后使用聚类分层聚类(AHC)对产生的嵌入或i向量进行分组，以获得diarization标签。新的深度神经网络嵌入方法在Albayzin发展数据集上获得了显着的性能，类似于众所周知的i向量方法所取得的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DNN-based Embeddings for Speaker Diarization in the AuDIaS-UAM System for the Albayzin 2018 IberSPEECH-RTVE Evaluation

This document describes the three systems submitted by the AuDIaS-UAM team for the Albayzin 2018 IberSPEECH-RTVE speaker diarization evaluation. Two of our systems (primary and contrastive 1 submissions) are based on embeddings which are a ﬁxed length representation of a given audio segment obtained from a deep neural network (DNN) trained for speaker classiﬁcation. The third system (contrastive 2) uses the classical i-vector as representation of the audio segments. The resulting embeddings or i-vectors are then grouped using Agglomerative Hierarchical Clustering (AHC) in order to obtain the diarization labels. The new DNN-embedding approach for speaker diarization has obtained a remarkable performance over the Albayzin development dataset, similar to the performance achieved with the well-known i-vector approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IberSPEECH Conference

自引率

0.00%

发文量