{"title":"探索多模态任务描述中说话人之间和说话人内部的差异","authors":"Stephanie Schreitter, Brigitte Krenn","doi":"10.1109/ROMAN.2014.6926228","DOIUrl":null,"url":null,"abstract":"In natural human-human task descriptions, the verbal and the non-verbal parts of communication together comprise the information necessary for understanding. When robots are to learn tasks from humans in the future, the detection and integrated interpretation of both of these cues is decisive. In the present paper, we present a qualitative study on essential verbal and non-verbal cues by means of which information is transmitted during explaining and showing a task to a learner. In order to collect a respective data set for further investigation, 16 (human) teachers explained to a human learner how to mount a tube in a box with holdings, and six teachers did this to a robot learner. Detailed multi-modal analysis revealed that in both conditions, information was more reliable when transmitted via verbal and gestural references to the visual scene and via eye gaze than via the actual wording. In particular, intra-speaker variability in wording and perspective taking by the teacher potentially hinders understanding of the learner. The results presented in this paper emphasize the importance of investigating the inherently multi-modal nature of how humans structure and transmit information in order to derive respective computational models for robot learners.","PeriodicalId":235810,"journal":{"name":"The 23rd IEEE International Symposium on Robot and Human Interactive Communication","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Exploring inter- and intra-speaker variability in multi-modal task descriptions\",\"authors\":\"Stephanie Schreitter, Brigitte Krenn\",\"doi\":\"10.1109/ROMAN.2014.6926228\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In natural human-human task descriptions, the verbal and the non-verbal parts of communication together comprise the information necessary for understanding. When robots are to learn tasks from humans in the future, the detection and integrated interpretation of both of these cues is decisive. In the present paper, we present a qualitative study on essential verbal and non-verbal cues by means of which information is transmitted during explaining and showing a task to a learner. In order to collect a respective data set for further investigation, 16 (human) teachers explained to a human learner how to mount a tube in a box with holdings, and six teachers did this to a robot learner. Detailed multi-modal analysis revealed that in both conditions, information was more reliable when transmitted via verbal and gestural references to the visual scene and via eye gaze than via the actual wording. In particular, intra-speaker variability in wording and perspective taking by the teacher potentially hinders understanding of the learner. The results presented in this paper emphasize the importance of investigating the inherently multi-modal nature of how humans structure and transmit information in order to derive respective computational models for robot learners.\",\"PeriodicalId\":235810,\"journal\":{\"name\":\"The 23rd IEEE International Symposium on Robot and Human Interactive Communication\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 23rd IEEE International Symposium on Robot and Human Interactive Communication\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ROMAN.2014.6926228\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 23rd IEEE International Symposium on Robot and Human Interactive Communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROMAN.2014.6926228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Exploring inter- and intra-speaker variability in multi-modal task descriptions
In natural human-human task descriptions, the verbal and the non-verbal parts of communication together comprise the information necessary for understanding. When robots are to learn tasks from humans in the future, the detection and integrated interpretation of both of these cues is decisive. In the present paper, we present a qualitative study on essential verbal and non-verbal cues by means of which information is transmitted during explaining and showing a task to a learner. In order to collect a respective data set for further investigation, 16 (human) teachers explained to a human learner how to mount a tube in a box with holdings, and six teachers did this to a robot learner. Detailed multi-modal analysis revealed that in both conditions, information was more reliable when transmitted via verbal and gestural references to the visual scene and via eye gaze than via the actual wording. In particular, intra-speaker variability in wording and perspective taking by the teacher potentially hinders understanding of the learner. The results presented in this paper emphasize the importance of investigating the inherently multi-modal nature of how humans structure and transmit information in order to derive respective computational models for robot learners.