{"title":"Development of a Platform for RNN Driven Multimodal Interaction with Embodied Conversational Agents","authors":"Hung-Hsuan Huang, Masato Fukuda, T. Nishida","doi":"10.1145/3308532.3329448","DOIUrl":null,"url":null,"abstract":"This paper describes our ongoing project to build a platform that enables real-time multimodal interaction with embodied conversational agents. All of the components are in modular design and can be switched to other models easily. A prototype listener agent has been developed upon the platform. Its spontaneous reactive behaviors are trained from a multimodal data corpus collected in a human-human conversation experiment. Two Gated Recurrent Unit (GRU) based models are switched when the agent is speaking or is not speaking. These models generate the agent's facial expressions, head movements, and postures from the corresponding behaviors of the human user in real-time. Benefits from the flexible design, the utterance generation part can be an autonomous dialogue manager with hand crafted rules, an on-line chatbot engine, or a human operator.","PeriodicalId":112642,"journal":{"name":"Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3308532.3329448","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
This paper describes our ongoing project to build a platform that enables real-time multimodal interaction with embodied conversational agents. All of the components are in modular design and can be switched to other models easily. A prototype listener agent has been developed upon the platform. Its spontaneous reactive behaviors are trained from a multimodal data corpus collected in a human-human conversation experiment. Two Gated Recurrent Unit (GRU) based models are switched when the agent is speaking or is not speaking. These models generate the agent's facial expressions, head movements, and postures from the corresponding behaviors of the human user in real-time. Benefits from the flexible design, the utterance generation part can be an autonomous dialogue manager with hand crafted rules, an on-line chatbot engine, or a human operator.