{"title":"基于RNN驱动的多模态交互平台的开发","authors":"Hung-Hsuan Huang, Masato Fukuda, T. Nishida","doi":"10.1145/3308532.3329448","DOIUrl":null,"url":null,"abstract":"This paper describes our ongoing project to build a platform that enables real-time multimodal interaction with embodied conversational agents. All of the components are in modular design and can be switched to other models easily. A prototype listener agent has been developed upon the platform. Its spontaneous reactive behaviors are trained from a multimodal data corpus collected in a human-human conversation experiment. Two Gated Recurrent Unit (GRU) based models are switched when the agent is speaking or is not speaking. These models generate the agent's facial expressions, head movements, and postures from the corresponding behaviors of the human user in real-time. Benefits from the flexible design, the utterance generation part can be an autonomous dialogue manager with hand crafted rules, an on-line chatbot engine, or a human operator.","PeriodicalId":112642,"journal":{"name":"Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Development of a Platform for RNN Driven Multimodal Interaction with Embodied Conversational Agents\",\"authors\":\"Hung-Hsuan Huang, Masato Fukuda, T. Nishida\",\"doi\":\"10.1145/3308532.3329448\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes our ongoing project to build a platform that enables real-time multimodal interaction with embodied conversational agents. All of the components are in modular design and can be switched to other models easily. A prototype listener agent has been developed upon the platform. Its spontaneous reactive behaviors are trained from a multimodal data corpus collected in a human-human conversation experiment. Two Gated Recurrent Unit (GRU) based models are switched when the agent is speaking or is not speaking. These models generate the agent's facial expressions, head movements, and postures from the corresponding behaviors of the human user in real-time. Benefits from the flexible design, the utterance generation part can be an autonomous dialogue manager with hand crafted rules, an on-line chatbot engine, or a human operator.\",\"PeriodicalId\":112642,\"journal\":{\"name\":\"Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3308532.3329448\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3308532.3329448","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Development of a Platform for RNN Driven Multimodal Interaction with Embodied Conversational Agents
This paper describes our ongoing project to build a platform that enables real-time multimodal interaction with embodied conversational agents. All of the components are in modular design and can be switched to other models easily. A prototype listener agent has been developed upon the platform. Its spontaneous reactive behaviors are trained from a multimodal data corpus collected in a human-human conversation experiment. Two Gated Recurrent Unit (GRU) based models are switched when the agent is speaking or is not speaking. These models generate the agent's facial expressions, head movements, and postures from the corresponding behaviors of the human user in real-time. Benefits from the flexible design, the utterance generation part can be an autonomous dialogue manager with hand crafted rules, an on-line chatbot engine, or a human operator.