{"title":"Engagement Intention Estimation in Multiparty Human-Robot Interaction","authors":"Zhijie Zhang, Jianmin Zheng, N. Magnenat-Thalmann","doi":"10.1109/RO-MAN50785.2021.9515373","DOIUrl":null,"url":null,"abstract":"As the applications of intelligent agents (IAs) are gradually increasing in daily life, they are expected to have reasonable social intelligence to interact with people by appropriately interpreting human behavior and intention. This paper presents a method to estimate whether people have willingness to join in a conversation, which helps to endow IAs with the capability of detecting potential participants. The method is built on the CNN-LSTM network, which takes image features and social signals as input, making use of general information conveyed in images, semantic social cues proven by social psychology studies, and temporal information in the sequence of inputs. The network is designed to have a multi-branch structure with the flexibility of accommodating different types of inputs. We also discuss the signal transition in multiparty human-robot interaction scenarios. The method is evaluated on three datasets with social signals and/or images as inputs. The results show that the proposed method can infer human engagement intention well.","PeriodicalId":6854,"journal":{"name":"2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN)","volume":"38 1","pages":"117-122"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RO-MAN50785.2021.9515373","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
As the applications of intelligent agents (IAs) are gradually increasing in daily life, they are expected to have reasonable social intelligence to interact with people by appropriately interpreting human behavior and intention. This paper presents a method to estimate whether people have willingness to join in a conversation, which helps to endow IAs with the capability of detecting potential participants. The method is built on the CNN-LSTM network, which takes image features and social signals as input, making use of general information conveyed in images, semantic social cues proven by social psychology studies, and temporal information in the sequence of inputs. The network is designed to have a multi-branch structure with the flexibility of accommodating different types of inputs. We also discuss the signal transition in multiparty human-robot interaction scenarios. The method is evaluated on three datasets with social signals and/or images as inputs. The results show that the proposed method can infer human engagement intention well.