查看独立电脑唇读

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI:10.1109/ICME.2012.192

Yuxuan Lan, B. Theobald, R. Harvey

{"title":"查看独立电脑唇读","authors":"Yuxuan Lan, B. Theobald, R. Harvey","doi":"10.1109/ICME.2012.192","DOIUrl":null,"url":null,"abstract":"Computer lip-reading systems are usually designed to work using a full-frontal view of the face. However, many human experts tend to prefer to lip-read using an angled view. In this paper we consider issues related to the best viewing angle for an automated lip-reading system. In particular, we seek answers to the following questions: (1) Do computers lip-read better using a frontal or a non-frontal view of the face? (2) What is the best viewing angle for a computer lip-reading system? (3) How can a computer lip-reading system be made to work independently of viewing angle? We investigate these issues using a purpose built audio-visual dataset that contains simultaneous recordings of a speaker reciting continuous speech at five angles. We find that the system performs best on a non-frontal view, perhaps because lip gestures, such as lip-protrusion and lip-rounding, are more pronounced when viewing from an angle. We also describe a simple linear mapping that allows us to map any view of the face to the view that we find to be optimal. Hence we present a view-independent lip-reading system.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"29 9","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":"{\"title\":\"View Independent Computer Lip-Reading\",\"authors\":\"Yuxuan Lan, B. Theobald, R. Harvey\",\"doi\":\"10.1109/ICME.2012.192\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Computer lip-reading systems are usually designed to work using a full-frontal view of the face. However, many human experts tend to prefer to lip-read using an angled view. In this paper we consider issues related to the best viewing angle for an automated lip-reading system. In particular, we seek answers to the following questions: (1) Do computers lip-read better using a frontal or a non-frontal view of the face? (2) What is the best viewing angle for a computer lip-reading system? (3) How can a computer lip-reading system be made to work independently of viewing angle? We investigate these issues using a purpose built audio-visual dataset that contains simultaneous recordings of a speaker reciting continuous speech at five angles. We find that the system performs best on a non-frontal view, perhaps because lip gestures, such as lip-protrusion and lip-rounding, are more pronounced when viewing from an angle. We also describe a simple linear mapping that allows us to map any view of the face to the view that we find to be optimal. Hence we present a view-independent lip-reading system.\",\"PeriodicalId\":273567,\"journal\":{\"name\":\"2012 IEEE International Conference on Multimedia and Expo\",\"volume\":\"29 9\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"44\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE International Conference on Multimedia and Expo\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICME.2012.192\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Conference on Multimedia and Expo","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2012.192","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 44

摘要

计算机唇读系统通常设计为使用脸部的正面视图。然而，许多人类专家倾向于用一个角度来唇读。在本文中，我们考虑了一个自动唇读系统的最佳视角相关的问题。特别是，我们寻求以下问题的答案:(1)计算机在使用正面或非正面面部视图时唇读效果更好?(2)电脑唇读系统的最佳视角是什么?(3)如何使计算机唇读系统独立于视角工作?我们使用专门构建的视听数据集来研究这些问题，该数据集包含说话者以五个角度背诵连续演讲的同时录音。我们发现，该系统在非正面视角下表现最好，这可能是因为从一个角度看时，嘴唇的手势，如嘴唇突出和嘴唇圆润更明显。我们还描述了一个简单的线性映射，它允许我们将面部的任何视图映射到我们认为最优的视图。因此，我们提出了一个独立于视图的唇读系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

View Independent Computer Lip-Reading

Computer lip-reading systems are usually designed to work using a full-frontal view of the face. However, many human experts tend to prefer to lip-read using an angled view. In this paper we consider issues related to the best viewing angle for an automated lip-reading system. In particular, we seek answers to the following questions: (1) Do computers lip-read better using a frontal or a non-frontal view of the face? (2) What is the best viewing angle for a computer lip-reading system? (3) How can a computer lip-reading system be made to work independently of viewing angle? We investigate these issues using a purpose built audio-visual dataset that contains simultaneous recordings of a speaker reciting continuous speech at five angles. We find that the system performs best on a non-frontal view, perhaps because lip gestures, such as lip-protrusion and lip-rounding, are more pronounced when viewing from an angle. We also describe a simple linear mapping that allows us to map any view of the face to the view that we find to be optimal. Hence we present a view-independent lip-reading system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE International Conference on Multimedia and Expo

自引率

0.00%

发文量