Multi-view Mouth Renderization for Assisting Lip-reading

Andréa Britto Mattos, Dario Augusto Borges Oliveira
{"title":"Multi-view Mouth Renderization for Assisting Lip-reading","authors":"Andréa Britto Mattos, Dario Augusto Borges Oliveira","doi":"10.1145/3192714.3192824","DOIUrl":null,"url":null,"abstract":"Previous work demonstrated that people who rely on lip-reading often prefer a frontal view of their interlocutor, but sometimes a profile view may display certain lip gestures more noticeably. This work refers to an assistive tool that receives an unconstrained video of a speaker, captured at an arbitrary view, and not only locates the mouth region but also displays augmented versions of the lips in the frontal and profile views. This is made using deep Generative Adversarial Networks (GANs) trained on several pairs of images. In the training set, each pair contains a mouth picture taken at a random angle and the corresponding picture (i.e., relative to the same mouth shape, person, and lighting condition) taken at a fixed view. In the test phase, the networks are able to receive an unseen mouth image taken at an arbitrary angle and map it to the fixed views -- frontal and profile. Because building a large-scale pairwise dataset is time consuming, we use realistic synthetic 3D models for training, and videos of real subjects as input for testing. Our approach is speaker-independent, language-independent, and our results demonstrate that the GAN can produce visually compelling results that may assist people with hearing impairment.","PeriodicalId":330095,"journal":{"name":"Proceedings of the Internet of Accessible Things","volume":"23 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Internet of Accessible Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3192714.3192824","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Previous work demonstrated that people who rely on lip-reading often prefer a frontal view of their interlocutor, but sometimes a profile view may display certain lip gestures more noticeably. This work refers to an assistive tool that receives an unconstrained video of a speaker, captured at an arbitrary view, and not only locates the mouth region but also displays augmented versions of the lips in the frontal and profile views. This is made using deep Generative Adversarial Networks (GANs) trained on several pairs of images. In the training set, each pair contains a mouth picture taken at a random angle and the corresponding picture (i.e., relative to the same mouth shape, person, and lighting condition) taken at a fixed view. In the test phase, the networks are able to receive an unseen mouth image taken at an arbitrary angle and map it to the fixed views -- frontal and profile. Because building a large-scale pairwise dataset is time consuming, we use realistic synthetic 3D models for training, and videos of real subjects as input for testing. Our approach is speaker-independent, language-independent, and our results demonstrate that the GAN can produce visually compelling results that may assist people with hearing impairment.
辅助唇读的多视图嘴巴渲染
先前的研究表明,依赖唇读的人通常更喜欢对话者的正面视图,但有时侧面视图可能会更明显地显示某些嘴唇手势。这项工作指的是一种辅助工具,它可以接收说话者在任意视图下拍摄的不受约束的视频,不仅可以定位嘴部区域,还可以在正面和侧面视图中显示增强版本的嘴唇。这是使用深度生成对抗网络(GANs)在几对图像上训练完成的。在训练集中,每对包含一张随机角度拍摄的嘴巴图片,以及在固定视图下拍摄的对应图片(即相对于相同的嘴型、人、光照条件)。在测试阶段,这些网络能够接收到以任意角度拍摄的看不见的嘴部图像,并将其映射到固定的视图中——正面和侧面。由于构建大规模的两两数据集非常耗时,我们使用逼真的合成3D模型进行训练,并使用真实受试者的视频作为输入进行测试。我们的方法是独立于说话者和语言的,我们的结果表明,GAN可以产生视觉上引人注目的结果,这可能有助于听力障碍的人。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信