AUGUST: an Automatic Generation Understudy for Synthesizing Conversational Recommendation Datasets

Annual Meeting of the Association for Computational Linguistics Pub Date : 2023-06-16 DOI:10.48550/arXiv.2306.09631

Yu Lu, Junwei Bao, Zichen Ma, Xiaoguang Han, Youzheng Wu, Shuguang Cui, Xiaodong He

{"title":"AUGUST: an Automatic Generation Understudy for Synthesizing Conversational Recommendation Datasets","authors":"Yu Lu, Junwei Bao, Zichen Ma, Xiaoguang Han, Youzheng Wu, Shuguang Cui, Xiaodong He","doi":"10.48550/arXiv.2306.09631","DOIUrl":null,"url":null,"abstract":"High-quality data is essential for conversational recommendation systems and serves as the cornerstone of the network architecture development and training strategy design. Existing works contribute heavy human efforts to manually labeling or designing and extending recommender dialogue templates. However, they suffer from (i) the limited number of human annotators results in that datasets can hardly capture rich and large-scale cases in the real world, (ii) the limited experience and knowledge of annotators account for the uninformative corpus and inappropriate recommendations. In this paper, we propose a novel automatic dataset synthesis approach that can generate both large-scale and high-quality recommendation dialogues through a data2text generation process, where unstructured recommendation conversations are generated from structured graphs based on user-item information from the real world. In doing so, we comprehensively exploit: (i) rich personalized user profiles from traditional recommendation datasets, (ii) rich external knowledge from knowledge graphs, and (iii) the conversation ability contained in human-to-human conversational recommendation datasets. Extensive experiments validate the benefit brought by the automatically synthesized data under low-resource scenarios and demonstrate the promising potential to facilitate the development of a more effective conversational recommendation system.","PeriodicalId":352845,"journal":{"name":"Annual Meeting of the Association for Computational Linguistics","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Meeting of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2306.09631","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

High-quality data is essential for conversational recommendation systems and serves as the cornerstone of the network architecture development and training strategy design. Existing works contribute heavy human efforts to manually labeling or designing and extending recommender dialogue templates. However, they suffer from (i) the limited number of human annotators results in that datasets can hardly capture rich and large-scale cases in the real world, (ii) the limited experience and knowledge of annotators account for the uninformative corpus and inappropriate recommendations. In this paper, we propose a novel automatic dataset synthesis approach that can generate both large-scale and high-quality recommendation dialogues through a data2text generation process, where unstructured recommendation conversations are generated from structured graphs based on user-item information from the real world. In doing so, we comprehensively exploit: (i) rich personalized user profiles from traditional recommendation datasets, (ii) rich external knowledge from knowledge graphs, and (iii) the conversation ability contained in human-to-human conversational recommendation datasets. Extensive experiments validate the benefit brought by the automatically synthesized data under low-resource scenarios and demonstrate the promising potential to facilitate the development of a more effective conversational recommendation system.

查看原文本刊更多论文

八月:合成会话推荐数据集的自动生成替代研究

高质量的数据是会话推荐系统的基础，是网络架构开发和训练策略设计的基石。现有的工作为手动标记或设计和扩展推荐对话模板贡献了大量的人力。然而，它们存在以下问题:(i)人类注释者数量有限，导致数据集难以捕获现实世界中丰富而大规模的案例;(ii)注释者的经验和知识有限，导致语料库信息不足，推荐不恰当。在本文中，我们提出了一种新的自动数据集合成方法，该方法可以通过data2text生成过程生成大规模和高质量的推荐对话，其中非结构化推荐对话是基于来自现实世界的用户项目信息的结构化图生成的。在此过程中，我们全面利用:(i)从传统推荐数据集中丰富的个性化用户档案，(ii)从知识图中丰富的外部知识，以及(iii)包含在人与人对话推荐数据集中的会话能力。大量的实验验证了在低资源场景下自动合成数据带来的好处，并展示了促进开发更有效的会话推荐系统的良好潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Annual Meeting of the Association for Computational Linguistics

自引率

0.00%

发文量