基于语音的虚拟人机交互实时手势动画生成

Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems Pub Date : 2021-05-08 DOI:10.1145/3411763.3451554

M. Rebol, C. Gütl, Krzysztof Pietroszek

{"title":"基于语音的虚拟人机交互实时手势动画生成","authors":"M. Rebol, C. Gütl, Krzysztof Pietroszek","doi":"10.1145/3411763.3451554","DOIUrl":null,"url":null,"abstract":"We propose a real-time system for synthesizing gestures directly from speech. Our data-driven approach is based on Generative Adversarial Neural Networks to model the speech-gesture relationship. We utilize the large amount of speaker video data available online to train our 3D gesture model. Our model generates speaker-specific gestures by taking consecutive audio input chunks of two seconds in length. We animate the predicted gestures on a virtual avatar. We achieve a delay below three seconds between the time of audio input and gesture animation.","PeriodicalId":265192,"journal":{"name":"Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Real-time Gesture Animation Generation from Speech for Virtual Human Interaction\",\"authors\":\"M. Rebol, C. Gütl, Krzysztof Pietroszek\",\"doi\":\"10.1145/3411763.3451554\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a real-time system for synthesizing gestures directly from speech. Our data-driven approach is based on Generative Adversarial Neural Networks to model the speech-gesture relationship. We utilize the large amount of speaker video data available online to train our 3D gesture model. Our model generates speaker-specific gestures by taking consecutive audio input chunks of two seconds in length. We animate the predicted gestures on a virtual avatar. We achieve a delay below three seconds between the time of audio input and gesture animation.\",\"PeriodicalId\":265192,\"journal\":{\"name\":\"Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3411763.3451554\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3411763.3451554","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

我们提出了一个实时系统，直接从语音合成手势。我们的数据驱动方法基于生成对抗神经网络来模拟语音-手势关系。我们利用大量的在线演讲者视频数据来训练我们的3D手势模型。我们的模型通过获取长度为两秒的连续音频输入块来生成讲话者特定的手势。我们在虚拟化身上动画化预测的手势。我们实现了音频输入和手势动画之间的延迟小于3秒。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Real-time Gesture Animation Generation from Speech for Virtual Human Interaction

We propose a real-time system for synthesizing gestures directly from speech. Our data-driven approach is based on Generative Adversarial Neural Networks to model the speech-gesture relationship. We utilize the large amount of speaker video data available online to train our 3D gesture model. Our model generates speaker-specific gestures by taking consecutive audio input chunks of two seconds in length. We animate the predicted gestures on a virtual avatar. We achieve a delay below three seconds between the time of audio input and gesture animation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems

自引率

0.00%

发文量