Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar

2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI) Pub Date : 2022-10-01 DOI:10.1109/ICTAI56018.2022.00153

Aolan Sun, Xulong Zhang, Tiandong Ling, Jianzong Wang, Ning Cheng, Jing Xiao

{"title":"Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar","authors":"Aolan Sun, Xulong Zhang, Tiandong Ling, Jianzong Wang, Ning Cheng, Jing Xiao","doi":"10.1109/ICTAI56018.2022.00153","DOIUrl":null,"url":null,"abstract":"Since the beginning of the COVID-19 pandemic, remote conferencing and school-teaching have become important tools. The previous applications aim to save the commuting cost with real-time interactions. However, our application is going to lower the production and reproduction costs when preparing the communication materials. This paper proposes a system called Pre-Avatar, generating a presentation video with a talking face of a target speaker with 1 front-face photo and a 3-minute voice recording. Technically, the system consists of three main modules, user experience interface (UEI), talking face module and few-shot text-to-speech (TTS) module. The system firstly clones the target speaker's voice, and then generates the speech, and finally generate an avatar with appropriate lip and head movements. Under any scenario, users only need to replace slides with different notes to generate another new video. The demo has been released here11https://pre-avatar.github.io/ and will be published as free software for use.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI56018.2022.00153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Since the beginning of the COVID-19 pandemic, remote conferencing and school-teaching have become important tools. The previous applications aim to save the commuting cost with real-time interactions. However, our application is going to lower the production and reproduction costs when preparing the communication materials. This paper proposes a system called Pre-Avatar, generating a presentation video with a talking face of a target speaker with 1 front-face photo and a 3-minute voice recording. Technically, the system consists of three main modules, user experience interface (UEI), talking face module and few-shot text-to-speech (TTS) module. The system firstly clones the target speaker's voice, and then generates the speech, and finally generate an avatar with appropriate lip and head movements. Under any scenario, users only need to replace slides with different notes to generate another new video. The demo has been released here11https://pre-avatar.github.io/ and will be published as free software for use.

查看原文本刊更多论文

Pre-Avatar:利用会说话的Avatar的自动呈现生成框架

自2019冠状病毒病大流行开始以来，远程会议和学校教学已成为重要工具。以往的应用都是通过实时交互来节省通勤成本。然而，我们的应用程序将在准备通信材料时降低生产和复制成本。本文提出了一个名为“Pre-Avatar”的系统，该系统生成一个带有目标演讲者的说话脸的演示视频，其中包含一张正面照片和3分钟的录音。从技术上讲，该系统由三个主要模块组成:用户体验界面(UEI)、对话人脸模块和少量文本转语音(TTS)模块。该系统首先克隆目标说话人的声音，然后生成语音，最后生成一个具有适当嘴唇和头部动作的化身。在任何情况下，用户只需要用不同的笔记替换幻灯片来生成另一个新视频。演示已经发布在这里11https://pre-avatar.github。并将作为免费软件发布供用户使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)

自引率

0.00%

发文量