{"title":"基于用户注视和语言指令的人机协作环境下机器人任务参数可靠估计","authors":"S. K. Paul, M. Nicolescu, M. Nicolescu","doi":"10.1145/3589572.3589580","DOIUrl":null,"url":null,"abstract":"As robots become more ubiquitous in our daily life, it has become very important to extract task and environmental information through more natural, meaningful, and easy-to-use interaction interfaces. Not only this helps the user to adapt to (thus trust) a robot in a collaborative environment, it can supplement the core sensory information, helping the robot make reliable decisions. This paper presents a framework that combines two natural interaction interfaces: speech and gaze to reliably infer the object of interest and the robotic task parameters. The gaze estimation module utilizes pre-defined 3D facial points and matches them to a set of extracted estimated 3D facial landmarks of the users from 2D images to infer the gaze direction. Subsequently, the verbal instructions are passed through a deep learning model to extract the information relevant to a robotic task. These extracted task parameters from verbal instructions along with the estimated gaze directions are combined to detect and/or disambiguate objects in the scene to generate the final task configurations. The proposed framework shows very promising results in integrating the relevant task parameters for the intended robotic tasks in different real-world interaction scenarios.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrating User Gaze with Verbal Instruction to Reliably Estimate Robotic Task Parameters in a Human-Robot Collaborative Environment\",\"authors\":\"S. K. Paul, M. Nicolescu, M. Nicolescu\",\"doi\":\"10.1145/3589572.3589580\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As robots become more ubiquitous in our daily life, it has become very important to extract task and environmental information through more natural, meaningful, and easy-to-use interaction interfaces. Not only this helps the user to adapt to (thus trust) a robot in a collaborative environment, it can supplement the core sensory information, helping the robot make reliable decisions. This paper presents a framework that combines two natural interaction interfaces: speech and gaze to reliably infer the object of interest and the robotic task parameters. The gaze estimation module utilizes pre-defined 3D facial points and matches them to a set of extracted estimated 3D facial landmarks of the users from 2D images to infer the gaze direction. Subsequently, the verbal instructions are passed through a deep learning model to extract the information relevant to a robotic task. These extracted task parameters from verbal instructions along with the estimated gaze directions are combined to detect and/or disambiguate objects in the scene to generate the final task configurations. The proposed framework shows very promising results in integrating the relevant task parameters for the intended robotic tasks in different real-world interaction scenarios.\",\"PeriodicalId\":296325,\"journal\":{\"name\":\"Proceedings of the 2023 6th International Conference on Machine Vision and Applications\",\"volume\":\"76 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 6th International Conference on Machine Vision and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3589572.3589580\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3589572.3589580","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Integrating User Gaze with Verbal Instruction to Reliably Estimate Robotic Task Parameters in a Human-Robot Collaborative Environment
As robots become more ubiquitous in our daily life, it has become very important to extract task and environmental information through more natural, meaningful, and easy-to-use interaction interfaces. Not only this helps the user to adapt to (thus trust) a robot in a collaborative environment, it can supplement the core sensory information, helping the robot make reliable decisions. This paper presents a framework that combines two natural interaction interfaces: speech and gaze to reliably infer the object of interest and the robotic task parameters. The gaze estimation module utilizes pre-defined 3D facial points and matches them to a set of extracted estimated 3D facial landmarks of the users from 2D images to infer the gaze direction. Subsequently, the verbal instructions are passed through a deep learning model to extract the information relevant to a robotic task. These extracted task parameters from verbal instructions along with the estimated gaze directions are combined to detect and/or disambiguate objects in the scene to generate the final task configurations. The proposed framework shows very promising results in integrating the relevant task parameters for the intended robotic tasks in different real-world interaction scenarios.