Multi-view Mouth Renderization for Assisting Lip-reading

Proceedings of the Internet of Accessible Things Pub Date : 2018-04-23 DOI:10.1145/3192714.3192824

Andréa Britto Mattos, Dario Augusto Borges Oliveira

{"title":"Multi-view Mouth Renderization for Assisting Lip-reading","authors":"Andréa Britto Mattos, Dario Augusto Borges Oliveira","doi":"10.1145/3192714.3192824","DOIUrl":null,"url":null,"abstract":"Previous work demonstrated that people who rely on lip-reading often prefer a frontal view of their interlocutor, but sometimes a profile view may display certain lip gestures more noticeably. This work refers to an assistive tool that receives an unconstrained video of a speaker, captured at an arbitrary view, and not only locates the mouth region but also displays augmented versions of the lips in the frontal and profile views. This is made using deep Generative Adversarial Networks (GANs) trained on several pairs of images. In the training set, each pair contains a mouth picture taken at a random angle and the corresponding picture (i.e., relative to the same mouth shape, person, and lighting condition) taken at a fixed view. In the test phase, the networks are able to receive an unseen mouth image taken at an arbitrary angle and map it to the fixed views -- frontal and profile. Because building a large-scale pairwise dataset is time consuming, we use realistic synthetic 3D models for training, and videos of real subjects as input for testing. Our approach is speaker-independent, language-independent, and our results demonstrate that the GAN can produce visually compelling results that may assist people with hearing impairment.","PeriodicalId":330095,"journal":{"name":"Proceedings of the Internet of Accessible Things","volume":"23 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Internet of Accessible Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3192714.3192824","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Previous work demonstrated that people who rely on lip-reading often prefer a frontal view of their interlocutor, but sometimes a profile view may display certain lip gestures more noticeably. This work refers to an assistive tool that receives an unconstrained video of a speaker, captured at an arbitrary view, and not only locates the mouth region but also displays augmented versions of the lips in the frontal and profile views. This is made using deep Generative Adversarial Networks (GANs) trained on several pairs of images. In the training set, each pair contains a mouth picture taken at a random angle and the corresponding picture (i.e., relative to the same mouth shape, person, and lighting condition) taken at a fixed view. In the test phase, the networks are able to receive an unseen mouth image taken at an arbitrary angle and map it to the fixed views -- frontal and profile. Because building a large-scale pairwise dataset is time consuming, we use realistic synthetic 3D models for training, and videos of real subjects as input for testing. Our approach is speaker-independent, language-independent, and our results demonstrate that the GAN can produce visually compelling results that may assist people with hearing impairment.

查看原文本刊更多论文

辅助唇读的多视图嘴巴渲染

先前的研究表明，依赖唇读的人通常更喜欢对话者的正面视图，但有时侧面视图可能会更明显地显示某些嘴唇手势。这项工作指的是一种辅助工具，它可以接收说话者在任意视图下拍摄的不受约束的视频，不仅可以定位嘴部区域，还可以在正面和侧面视图中显示增强版本的嘴唇。这是使用深度生成对抗网络(GANs)在几对图像上训练完成的。在训练集中，每对包含一张随机角度拍摄的嘴巴图片，以及在固定视图下拍摄的对应图片(即相对于相同的嘴型、人、光照条件)。在测试阶段，这些网络能够接收到以任意角度拍摄的看不见的嘴部图像，并将其映射到固定的视图中——正面和侧面。由于构建大规模的两两数据集非常耗时，我们使用逼真的合成3D模型进行训练，并使用真实受试者的视频作为输入进行测试。我们的方法是独立于说话者和语言的，我们的结果表明，GAN可以产生视觉上引人注目的结果，这可能有助于听力障碍的人。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Internet of Accessible Things

自引率

0.00%

发文量