多模态数据驱动手势合成的零镜头风格转移

2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG) Pub Date : 2023-01-05 DOI:10.1109/FG57933.2023.10042658

Mireille Fares, C. Pelachaud, Nicolas Obin

{"title":"多模态数据驱动手势合成的零镜头风格转移","authors":"Mireille Fares, C. Pelachaud, Nicolas Obin","doi":"10.1109/FG57933.2023.10042658","DOIUrl":null,"url":null,"abstract":"We propose a multimodal speech driven approach to generate 2D upper-body gestures for virtual agents, in the communicative style of different speakers, seen or unseen by our model during training. Upper-body gestures of a source speaker are generated based on the content of his/her multimodal data - speech acoustics and text semantics. The synthesized source speaker's gestures are conditioned on the multimodal style representation of the target speaker. Our approach is zero-shot, and can generalize the style transfer to new unseen speakers, without any additional training. An objective evaluation is conducted to validate our approach.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Zero-Shot Style Transfer for Multimodal Data-Driven Gesture Synthesis\",\"authors\":\"Mireille Fares, C. Pelachaud, Nicolas Obin\",\"doi\":\"10.1109/FG57933.2023.10042658\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a multimodal speech driven approach to generate 2D upper-body gestures for virtual agents, in the communicative style of different speakers, seen or unseen by our model during training. Upper-body gestures of a source speaker are generated based on the content of his/her multimodal data - speech acoustics and text semantics. The synthesized source speaker's gestures are conditioned on the multimodal style representation of the target speaker. Our approach is zero-shot, and can generalize the style transfer to new unseen speakers, without any additional training. An objective evaluation is conducted to validate our approach.\",\"PeriodicalId\":318766,\"journal\":{\"name\":\"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)\",\"volume\":\"88 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FG57933.2023.10042658\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FG57933.2023.10042658","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

我们提出了一种多模态语音驱动方法，以不同说话者的交流风格为虚拟代理生成2D上半身手势，我们的模型在训练期间可以看到或看不到。源说话人的上半身手势是基于他/她的多模态数据内容——语音声学和文本语义生成的。合成的源说话人的手势以目标说话人的多模态风格表示为条件。我们的方法是零射击，并且可以将风格转移到新的未见过的演讲者身上，而无需任何额外的培训。通过客观的评价来验证我们的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Zero-Shot Style Transfer for Multimodal Data-Driven Gesture Synthesis

We propose a multimodal speech driven approach to generate 2D upper-body gestures for virtual agents, in the communicative style of different speakers, seen or unseen by our model during training. Upper-body gestures of a source speaker are generated based on the content of his/her multimodal data - speech acoustics and text semantics. The synthesized source speaker's gestures are conditioned on the multimodal style representation of the target speaker. Our approach is zero-shot, and can generalize the style transfer to new unseen speakers, without any additional training. An objective evaluation is conducted to validate our approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)

自引率

0.00%

发文量