Zero-Shot Style Transfer for Multimodal Data-Driven Gesture Synthesis

2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG) Pub Date : 2023-01-05 DOI:10.1109/FG57933.2023.10042658

Mireille Fares, C. Pelachaud, Nicolas Obin

引用次数: 2

Abstract

We propose a multimodal speech driven approach to generate 2D upper-body gestures for virtual agents, in the communicative style of different speakers, seen or unseen by our model during training. Upper-body gestures of a source speaker are generated based on the content of his/her multimodal data - speech acoustics and text semantics. The synthesized source speaker's gestures are conditioned on the multimodal style representation of the target speaker. Our approach is zero-shot, and can generalize the style transfer to new unseen speakers, without any additional training. An objective evaluation is conducted to validate our approach.

查看原文本刊更多论文

多模态数据驱动手势合成的零镜头风格转移

我们提出了一种多模态语音驱动方法，以不同说话者的交流风格为虚拟代理生成2D上半身手势，我们的模型在训练期间可以看到或看不到。源说话人的上半身手势是基于他/她的多模态数据内容——语音声学和文本语义生成的。合成的源说话人的手势以目标说话人的多模态风格表示为条件。我们的方法是零射击，并且可以将风格转移到新的未见过的演讲者身上，而无需任何额外的培训。通过客观的评价来验证我们的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)

自引率

0.00%

发文量