UPC Multimodal Speaker Diarization System for the 2018 Albayzin Challenge

Miquel Àngel India Massana, Itziar Sagastiberri, Ponç Palau, E. Sayrol, J. Morros, J. Hernando
{"title":"UPC Multimodal Speaker Diarization System for the 2018 Albayzin Challenge","authors":"Miquel Àngel India Massana, Itziar Sagastiberri, Ponç Palau, E. Sayrol, J. Morros, J. Hernando","doi":"10.21437/iberspeech.2018-40","DOIUrl":null,"url":null,"abstract":"This paper presents the UPC system proposed for the Multimodal Speaker Diarization task of the 2018 Albayzin Challenge. This approach works by processing individually the speech and the image signal. In the speech domain, speaker diarization is performed using identity embeddings created by a triplet loss DNN that uses i-vectors as input. The triplet DNN is trained with an additional regularization loss that minimizes the variance of both positive and negative distances. A sliding windows is then used to compare speech segments with enrollment speaker targets using cosine distance between the embeddings. To detect identities from the face modality, a face detector followed by a face tracker has been used on the videos. For each cropped face a feature vector is obtained using a Deep Neural Network based on the ResNet 34 architecture, trained using a metric learning triplet loss (available from dlib library). For each track the face feature vector is obtained by averaging the features obtained for each one of the frames of that track. Then, this feature vector is compared with the features extracted from the images of the enrollment identities. The proposed system is evaluated on the RTVE2018 database.","PeriodicalId":115963,"journal":{"name":"IberSPEECH Conference","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IberSPEECH Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/iberspeech.2018-40","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

This paper presents the UPC system proposed for the Multimodal Speaker Diarization task of the 2018 Albayzin Challenge. This approach works by processing individually the speech and the image signal. In the speech domain, speaker diarization is performed using identity embeddings created by a triplet loss DNN that uses i-vectors as input. The triplet DNN is trained with an additional regularization loss that minimizes the variance of both positive and negative distances. A sliding windows is then used to compare speech segments with enrollment speaker targets using cosine distance between the embeddings. To detect identities from the face modality, a face detector followed by a face tracker has been used on the videos. For each cropped face a feature vector is obtained using a Deep Neural Network based on the ResNet 34 architecture, trained using a metric learning triplet loss (available from dlib library). For each track the face feature vector is obtained by averaging the features obtained for each one of the frames of that track. Then, this feature vector is compared with the features extracted from the images of the enrollment identities. The proposed system is evaluated on the RTVE2018 database.
2018 Albayzin挑战赛的UPC多模态扬声器分类系统
本文介绍了为2018年Albayzin挑战赛的多模态说话人Diarization任务提出的UPC系统。这种方法的工作原理是分别处理语音和图像信号。在语音域,使用使用i向量作为输入的三重损失DNN创建的身份嵌入来执行说话人的拨号化。三元组深度神经网络是用一个额外的正则化损失来训练的,这个正则化损失最小化了正负距离的方差。然后使用滑动窗口使用嵌入之间的余弦距离来比较语音片段和注册说话人目标。为了从人脸模态中检测身份,在视频中使用了人脸检测器和人脸跟踪器。对于每个裁剪的人脸,使用基于ResNet 34架构的深度神经网络获得特征向量,使用度量学习三重损失(可从dlib库获得)进行训练。对于每个轨迹,通过对该轨迹的每一帧获得的特征进行平均来获得人脸特征向量。然后,将该特征向量与从注册身份图像中提取的特征进行比较。该系统在RTVE2018数据库上进行了评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信