面向现实的自组织虚拟人的数据驱动框架:头部和眼睛的协调运动

Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications Pub Date : 2019-06-25 DOI:10.1145/3314111.3322874

Zhizhuo Yang, Reynold J. Bailey

{"title":"面向现实的自组织虚拟人的数据驱动框架:头部和眼睛的协调运动","authors":"Zhizhuo Yang, Reynold J. Bailey","doi":"10.1145/3314111.3322874","DOIUrl":null,"url":null,"abstract":"Driven by significant investments from the gaming, film, advertising, and customer service industries among others, efforts across many different fields are converging to create realistic representations of humans that look like (computer graphics), sound like (natural language generation), move like (motion capture), and reason like (artificial intelligence) real humans. The ultimate goal of this work is to push the boundaries even further by exploring the development of realistic self-organized virtual humans that are capable of demonstrating coordinated behaviors across different modalities. Eye movements, for example, may be accompanied by changes in facial expression, head orientation, posture, gait properties, or speech. Traditionally however, these modalities are captured and modeled separately and this disconnect contributes to the well-known uncanny valley phenomenon. We focus initially on facial modalities, in particular, coordinated eye and head movements (and eventually facial expressions), but our proposed data-driven framework will be able to accommodate other modalities as well. transfer [Laine et al. 2017]. Despite these advances, the resulting renderings or animations are often still distinguishable from a real human, sometimes in unsettling ways - the so called uncanny valley phenomenon [Mori et al. 2012]. We argue that the traditional approach of capturing and modeling various human modalities separately contributes this effect. In this work, we focus on capturing, transferring, and generating realistic coordinated facial modalities (eye movements, head movements, and eventually facial expressions). We envision a flexible framework that can be extended to accommodate other modalities as well.","PeriodicalId":161901,"journal":{"name":"Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards a data-driven framework for realistic self-organized virtual humans: coordinated head and eye movements\",\"authors\":\"Zhizhuo Yang, Reynold J. Bailey\",\"doi\":\"10.1145/3314111.3322874\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Driven by significant investments from the gaming, film, advertising, and customer service industries among others, efforts across many different fields are converging to create realistic representations of humans that look like (computer graphics), sound like (natural language generation), move like (motion capture), and reason like (artificial intelligence) real humans. The ultimate goal of this work is to push the boundaries even further by exploring the development of realistic self-organized virtual humans that are capable of demonstrating coordinated behaviors across different modalities. Eye movements, for example, may be accompanied by changes in facial expression, head orientation, posture, gait properties, or speech. Traditionally however, these modalities are captured and modeled separately and this disconnect contributes to the well-known uncanny valley phenomenon. We focus initially on facial modalities, in particular, coordinated eye and head movements (and eventually facial expressions), but our proposed data-driven framework will be able to accommodate other modalities as well. transfer [Laine et al. 2017]. Despite these advances, the resulting renderings or animations are often still distinguishable from a real human, sometimes in unsettling ways - the so called uncanny valley phenomenon [Mori et al. 2012]. We argue that the traditional approach of capturing and modeling various human modalities separately contributes this effect. In this work, we focus on capturing, transferring, and generating realistic coordinated facial modalities (eye movements, head movements, and eventually facial expressions). We envision a flexible framework that can be extended to accommodate other modalities as well.\",\"PeriodicalId\":161901,\"journal\":{\"name\":\"Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications\",\"volume\":\"100 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3314111.3322874\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3314111.3322874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在游戏、电影、广告和客户服务等行业的大量投资的推动下，许多不同领域的努力正在汇聚在一起，以创建逼真的人类表现形式，这些表现形式看起来像(计算机图形学)、听起来像(自然语言生成)、行动像(动作捕捉)、推理像(人工智能)真人。这项工作的最终目标是通过探索能够跨不同模式展示协调行为的现实自组织虚拟人的发展，进一步推动边界。例如，眼球运动可能伴随着面部表情、头部方向、姿势、步态特性或言语的变化。然而，传统上，这些模式被单独捕获和建模，这种脱节导致了众所周知的恐怖谷现象。我们最初专注于面部模式，特别是协调眼睛和头部运动(以及最终的面部表情)，但我们提出的数据驱动框架也将能够适应其他模式。transfer [Laine et al. 2017]。尽管有了这些进步，但最终的渲染或动画通常仍然与真实的人类区分开来，有时以令人不安的方式-所谓的恐怖谷现象[Mori et al. 2012]。我们认为，捕捉和建模各种人类形态的传统方法分别有助于这种效果。在这项工作中，我们专注于捕捉、转移和生成现实的协调面部模式(眼球运动、头部运动和最终的面部表情)。我们设想了一个灵活的框架，可以扩展以适应其他模式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards a data-driven framework for realistic self-organized virtual humans: coordinated head and eye movements

Driven by significant investments from the gaming, film, advertising, and customer service industries among others, efforts across many different fields are converging to create realistic representations of humans that look like (computer graphics), sound like (natural language generation), move like (motion capture), and reason like (artificial intelligence) real humans. The ultimate goal of this work is to push the boundaries even further by exploring the development of realistic self-organized virtual humans that are capable of demonstrating coordinated behaviors across different modalities. Eye movements, for example, may be accompanied by changes in facial expression, head orientation, posture, gait properties, or speech. Traditionally however, these modalities are captured and modeled separately and this disconnect contributes to the well-known uncanny valley phenomenon. We focus initially on facial modalities, in particular, coordinated eye and head movements (and eventually facial expressions), but our proposed data-driven framework will be able to accommodate other modalities as well. transfer [Laine et al. 2017]. Despite these advances, the resulting renderings or animations are often still distinguishable from a real human, sometimes in unsettling ways - the so called uncanny valley phenomenon [Mori et al. 2012]. We argue that the traditional approach of capturing and modeling various human modalities separately contributes this effect. In this work, we focus on capturing, transferring, and generating realistic coordinated facial modalities (eye movements, head movements, and eventually facial expressions). We envision a flexible framework that can be extended to accommodate other modalities as well.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications

自引率

0.00%

发文量