捕捉谈话中的上半身动作:一种外观准不变的方法

Proceedings of the 16th International Conference on Multimodal Interaction Pub Date : 2014-11-12 DOI:10.1145/2663204.2663267

Alvaro Marcos-Ramiro, Daniel Pizarro-Perez, Marta Marrón Romera, D. Gática-Pérez

{"title":"捕捉谈话中的上半身动作:一种外观准不变的方法","authors":"Alvaro Marcos-Ramiro, Daniel Pizarro-Perez, Marta Marrón Romera, D. Gática-Pérez","doi":"10.1145/2663204.2663267","DOIUrl":null,"url":null,"abstract":"We address the problem of body communication retrieval and measuring in seated conversations by means of markerless motion capture. In psychological studies, the use of automatic methods is key to reduce the subjectivity present in manual behavioral coding used to extract these cues. These studies usually involve hundreds of subjects with different clothing, non-acted poses, or different distances to the camera in uncalibrated, RGB-only video. However, range cameras are not yet common in psychology research, especially in existing recordings. Therefore, it becomes highly relevant to develop a fast method that is able to work in these conditions. Given the known relationship between depth and motion estimates, we propose to robustly integrate highly appearance-invariant image motion features in a machine learning approach, complemented with an effective tracking scheme. We evaluate the method's performance with existing databases and a database of upper body poses displayed in job interviews that we make public, showing that in our scenario it is comparable to that of Kinect without using a range camera, and state-of-the-art w.r.t. the HumanEva and ChaLearn 2011 evaluation datasets.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Capturing Upper Body Motion in Conversation: An Appearance Quasi-Invariant Approach\",\"authors\":\"Alvaro Marcos-Ramiro, Daniel Pizarro-Perez, Marta Marrón Romera, D. Gática-Pérez\",\"doi\":\"10.1145/2663204.2663267\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We address the problem of body communication retrieval and measuring in seated conversations by means of markerless motion capture. In psychological studies, the use of automatic methods is key to reduce the subjectivity present in manual behavioral coding used to extract these cues. These studies usually involve hundreds of subjects with different clothing, non-acted poses, or different distances to the camera in uncalibrated, RGB-only video. However, range cameras are not yet common in psychology research, especially in existing recordings. Therefore, it becomes highly relevant to develop a fast method that is able to work in these conditions. Given the known relationship between depth and motion estimates, we propose to robustly integrate highly appearance-invariant image motion features in a machine learning approach, complemented with an effective tracking scheme. We evaluate the method's performance with existing databases and a database of upper body poses displayed in job interviews that we make public, showing that in our scenario it is comparable to that of Kinect without using a range camera, and state-of-the-art w.r.t. the HumanEva and ChaLearn 2011 evaluation datasets.\",\"PeriodicalId\":389037,\"journal\":{\"name\":\"Proceedings of the 16th International Conference on Multimodal Interaction\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 16th International Conference on Multimodal Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2663204.2663267\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2663204.2663267","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

我们用无标记动作捕捉的方法来解决坐姿对话中身体交流的检索和测量问题。在心理学研究中，使用自动方法是减少用于提取这些线索的手动行为编码中存在的主观性的关键。这些研究通常涉及数百名受试者，他们穿着不同的衣服，没有动作的姿势，或者在未校准的rgb视频中与摄像机的距离不同。然而，野外摄像机在心理学研究中还不常见，尤其是在现有的记录中。因此，开发一种能够在这些条件下工作的快速方法变得高度相关。考虑到深度和运动估计之间的已知关系，我们建议在机器学习方法中鲁棒地集成高度外观不变的图像运动特征，并辅以有效的跟踪方案。我们使用现有数据库和公开的工作面试中显示的上半身姿势数据库来评估该方法的性能，结果表明，在我们的场景中，它可以与不使用距离摄像头的Kinect相媲美，并使用最先进的w.r.t. HumanEva和ChaLearn 2011评估数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Capturing Upper Body Motion in Conversation: An Appearance Quasi-Invariant Approach

We address the problem of body communication retrieval and measuring in seated conversations by means of markerless motion capture. In psychological studies, the use of automatic methods is key to reduce the subjectivity present in manual behavioral coding used to extract these cues. These studies usually involve hundreds of subjects with different clothing, non-acted poses, or different distances to the camera in uncalibrated, RGB-only video. However, range cameras are not yet common in psychology research, especially in existing recordings. Therefore, it becomes highly relevant to develop a fast method that is able to work in these conditions. Given the known relationship between depth and motion estimates, we propose to robustly integrate highly appearance-invariant image motion features in a machine learning approach, complemented with an effective tracking scheme. We evaluate the method's performance with existing databases and a database of upper body poses displayed in job interviews that we make public, showing that in our scenario it is comparable to that of Kinect without using a range camera, and state-of-the-art w.r.t. the HumanEva and ChaLearn 2011 evaluation datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 16th International Conference on Multimodal Interaction

自引率

0.00%

发文量