Automatic facial expressions, gaze direction and head movements generation of a virtual agent

Companion Publication of the 2022 International Conference on Multimodal Interaction Pub Date : 2022-11-07 DOI:10.1145/3536220.3558806

Alice Delbosc, M. Ochs, S. Ayache

{"title":"Automatic facial expressions, gaze direction and head movements generation of a virtual agent","authors":"Alice Delbosc, M. Ochs, S. Ayache","doi":"10.1145/3536220.3558806","DOIUrl":null,"url":null,"abstract":"In this article, we present two models to jointly and automatically generate the head, facial and gaze movements of a virtual agent from acoustic speech features. Two architectures are explored: a Generative Adversarial Network and an Adversarial Encoder-Decoder. Head movements and gaze orientation are generated as 3D coordinates, while facial expressions are generated using action units based on the facial action coding system. A large corpus of almost 4 hours of videos, involving 89 different speakers is used to train our models. We extract the speech and visual features automatically from these videos using existing tools. The evaluation of these models is conducted objectively with measures such as density evaluation and a visualisation from PCA reduction, as well as subjectively through a users perceptive study. Our proposed methodology shows that on 15 seconds sequences, encoder-decoder architecture drastically improves the perception of generated behaviours in two criteria: the coordination with speech and the naturalness. Our code can be found in : https://github.com/aldelb/non-verbal-behaviours-generation.","PeriodicalId":186796,"journal":{"name":"Companion Publication of the 2022 International Conference on Multimodal Interaction","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Publication of the 2022 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3536220.3558806","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

In this article, we present two models to jointly and automatically generate the head, facial and gaze movements of a virtual agent from acoustic speech features. Two architectures are explored: a Generative Adversarial Network and an Adversarial Encoder-Decoder. Head movements and gaze orientation are generated as 3D coordinates, while facial expressions are generated using action units based on the facial action coding system. A large corpus of almost 4 hours of videos, involving 89 different speakers is used to train our models. We extract the speech and visual features automatically from these videos using existing tools. The evaluation of these models is conducted objectively with measures such as density evaluation and a visualisation from PCA reduction, as well as subjectively through a users perceptive study. Our proposed methodology shows that on 15 seconds sequences, encoder-decoder architecture drastically improves the perception of generated behaviours in two criteria: the coordination with speech and the naturalness. Our code can be found in : https://github.com/aldelb/non-verbal-behaviours-generation.

查看原文本刊更多论文

自动面部表情，凝视方向和头部运动生成的虚拟代理

在这篇文章中，我们提出了两个模型来联合和自动地从声学语音特征中生成虚拟代理的头部、面部和凝视运动。探讨了两种架构:生成对抗网络和对抗编码器。头部运动和凝视方向以3D坐标的形式生成，面部表情以基于面部动作编码系统的动作单元生成。一个包含近4小时视频的大型语料库，涉及89个不同的演讲者，用于训练我们的模型。我们使用现有的工具从这些视频中自动提取语音和视觉特征。这些模型的评估是客观地通过密度评估和PCA约简的可视化等措施进行的，以及主观地通过用户感知研究进行的。我们提出的方法表明，在15秒序列上，编码器-解码器架构在两个标准上显著提高了对生成行为的感知:与语音的协调性和自然性。我们的代码可以在https://github.com/aldelb/non-verbal-behaviours-generation中找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Companion Publication of the 2022 International Conference on Multimodal Interaction

自引率

0.00%

发文量