Sisi Peng , Dan Qu , Wenlin Zhang , Hao Zhang , Shunhang Li , Minchen Xu
{"title":"Easy and effective! Data augmentation for knowledge-aware dialogue generation via multi-perspective sentences interaction","authors":"Sisi Peng , Dan Qu , Wenlin Zhang , Hao Zhang , Shunhang Li , Minchen Xu","doi":"10.1016/j.neucom.2024.128724","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, knowledge-based dialogue generation has garnered significant attention due to its capacity to produce informative and coherent responses through the integration of external knowledge into models. However, obtaining high-quality knowledge that aligns with the dialogue content poses a considerable challenge, necessitating substantial time and resources. To tackle the issue of limited dialogue data, a majority of research endeavors concentrate on data augmentation to augment the volume of training data. Regrettably, these methods overlook knowledge augmentation, leading to a restricted diversity in input data and yielding enhancements solely in specific metrics. Real-world conversations exhibit a spectrum of characteristics, including repetitions, reversals, and interruptions, demanding a heightened level of data diversity. In this study, we introduce a straightforward yet effective data augmentation technique known as Multi-perspective Sentence Interaction to bolster the connections among sentences from varied viewpoints. Through an examination of target responses from multiple dialogue perspectives, we enhance our comprehension of the relationships between dialogue sentences, thereby facilitating the expansion of knowledge-based dialogue data. Through experiments conducted on various knowledge-based dialogue datasets and utilizing different models, our findings illustrate a notable enhancement in the quality of model generation facilitated by our method. Specifically, we observed a 3.5% enhancement in reply accuracy and a 0.1506 increase in diversity (DIST-2). Moreover, there was a substantial improvement in knowledge selection accuracy by 19.04% and a reduction in model perplexity by 31.48%.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5000,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224014954","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, knowledge-based dialogue generation has garnered significant attention due to its capacity to produce informative and coherent responses through the integration of external knowledge into models. However, obtaining high-quality knowledge that aligns with the dialogue content poses a considerable challenge, necessitating substantial time and resources. To tackle the issue of limited dialogue data, a majority of research endeavors concentrate on data augmentation to augment the volume of training data. Regrettably, these methods overlook knowledge augmentation, leading to a restricted diversity in input data and yielding enhancements solely in specific metrics. Real-world conversations exhibit a spectrum of characteristics, including repetitions, reversals, and interruptions, demanding a heightened level of data diversity. In this study, we introduce a straightforward yet effective data augmentation technique known as Multi-perspective Sentence Interaction to bolster the connections among sentences from varied viewpoints. Through an examination of target responses from multiple dialogue perspectives, we enhance our comprehension of the relationships between dialogue sentences, thereby facilitating the expansion of knowledge-based dialogue data. Through experiments conducted on various knowledge-based dialogue datasets and utilizing different models, our findings illustrate a notable enhancement in the quality of model generation facilitated by our method. Specifically, we observed a 3.5% enhancement in reply accuracy and a 0.1506 increase in diversity (DIST-2). Moreover, there was a substantial improvement in knowledge selection accuracy by 19.04% and a reduction in model perplexity by 31.48%.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.