Who's Speaking?: Audio-Supervised Classification of Active Speakers in Video

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction Pub Date : 2015-11-09 DOI:10.1145/2818346.2820780

Punarjay Chakravarty, S. Mirzaei, T. Tuytelaars, H. V. hamme

引用次数: 35

Abstract

Active speakers have traditionally been identified in video by detecting their moving lips. This paper demonstrates the same using spatio-temporal features that aim to capture other cues: movement of the head, upper body and hands of active speakers. Speaker directional information, obtained using sound source localization from a microphone array is used to supervise the training of these video features.

查看原文本刊更多论文

说话的是谁?视频中主动说话者的音频监督分类

传统上，在视频中通过检测主动说话者的嘴唇来识别他们。本文用时空特征证明了这一点，这些特征旨在捕捉其他线索:主动说话者的头部、上身和手的运动。通过麦克风阵列的声源定位获得的说话人方向信息用于监督这些视频特征的训练。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

自引率

0.00%

发文量