The GTM-UVIGO System for Audiovisual Diarization

IberSPEECH Conference Pub Date : 2018-11-21 DOI:10.21437/IBERSPEECH.2018-41

Eduardo Ramos-Muguerza, Laura Docío Fernández, J. Alba-Castro

引用次数: 5

Abstract

This paper explains in detail the Audiovisual system deployed by the Multimedia Technologies Group (GTM) of the atlanTTic research center at the University of Vigo, for the Albayzin Multimodal Diarization Challenge (MDC) organized in the Iberspeech 2018 conference. This system is characterized by the use of state of the art face and speaker verification embeddings trained with publicly available Deep Neural Networks. Video and audio tracks are processed separately to obtain a matrix of confidence values of each time segment that are finally fused to make joint decisions on the speaker diarization result.

查看原文本刊更多论文

用于视听化的GTM-UVIGO系统

本文详细介绍了维戈大学大西洋研究中心多媒体技术组(GTM)为Iberspeech 2018会议组织的Albayzin多模式Diarization挑战(MDC)部署的视听系统。该系统的特点是使用了公开可用的深度神经网络训练的最先进的面部和说话人验证嵌入。分别对视频和音频轨道进行处理，得到每个时间段的置信度矩阵，最后对两个时间段的置信度矩阵进行融合，对说话人化结果进行联合决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IberSPEECH Conference

自引率

0.00%

发文量