{"title":"第三方监听代理的反向通道生成模型","authors":"Divesh Lala, K. Inoue, T. Kawahara, Kei Sawada","doi":"10.1145/3527188.3561926","DOIUrl":null,"url":null,"abstract":"In this work we propose a listening agent which can be used in a conversation between two humans. We firstly conduct a corpus analysis to identify three different categories of backchannel which the agent can use - responsive interjections, expressive interjections and shared laughs. From this data we train and evaluate a continuous backchannel generation model consisting of separate timing and form prediction models. We then conduct a subjective experiment to compare our model to random, dyadic, and ground truth models. We find that our model outperforms a random baseline and is comparable to the dyadic model despite the low evaluation of expressive interjections. We suggest that the perception of expressive interjections contribute significantly to the perception of the agent’s empathy and understanding of the conversation. The results also show the need for a more robust model to generate expressive interjections, perhaps aided by the use of linguistic features.","PeriodicalId":179256,"journal":{"name":"Proceedings of the 10th International Conference on Human-Agent Interaction","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Backchannel Generation Model for a Third Party Listener Agent\",\"authors\":\"Divesh Lala, K. Inoue, T. Kawahara, Kei Sawada\",\"doi\":\"10.1145/3527188.3561926\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work we propose a listening agent which can be used in a conversation between two humans. We firstly conduct a corpus analysis to identify three different categories of backchannel which the agent can use - responsive interjections, expressive interjections and shared laughs. From this data we train and evaluate a continuous backchannel generation model consisting of separate timing and form prediction models. We then conduct a subjective experiment to compare our model to random, dyadic, and ground truth models. We find that our model outperforms a random baseline and is comparable to the dyadic model despite the low evaluation of expressive interjections. We suggest that the perception of expressive interjections contribute significantly to the perception of the agent’s empathy and understanding of the conversation. The results also show the need for a more robust model to generate expressive interjections, perhaps aided by the use of linguistic features.\",\"PeriodicalId\":179256,\"journal\":{\"name\":\"Proceedings of the 10th International Conference on Human-Agent Interaction\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 10th International Conference on Human-Agent Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3527188.3561926\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th International Conference on Human-Agent Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3527188.3561926","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Backchannel Generation Model for a Third Party Listener Agent
In this work we propose a listening agent which can be used in a conversation between two humans. We firstly conduct a corpus analysis to identify three different categories of backchannel which the agent can use - responsive interjections, expressive interjections and shared laughs. From this data we train and evaluate a continuous backchannel generation model consisting of separate timing and form prediction models. We then conduct a subjective experiment to compare our model to random, dyadic, and ground truth models. We find that our model outperforms a random baseline and is comparable to the dyadic model despite the low evaluation of expressive interjections. We suggest that the perception of expressive interjections contribute significantly to the perception of the agent’s empathy and understanding of the conversation. The results also show the need for a more robust model to generate expressive interjections, perhaps aided by the use of linguistic features.