CDialog: A Multi-turn Covid-19 Conversation Dataset for Entity-Aware Dialog Generation

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing Pub Date : 2022-11-16 DOI:10.48550/arXiv.2212.06049

Deeksha Varshney, Aizan Zafar, Niranshu Kumar Behra, Asif Ekbal

{"title":"CDialog: A Multi-turn Covid-19 Conversation Dataset for Entity-Aware Dialog Generation","authors":"Deeksha Varshney, Aizan Zafar, Niranshu Kumar Behra, Asif Ekbal","doi":"10.48550/arXiv.2212.06049","DOIUrl":null,"url":null,"abstract":"The development of conversational agents to interact with patients and deliver clinical advice has attracted the interest of many researchers, particularly in light of the COVID-19 pandemic. The training of an end-to-end neural based dialog system, on the other hand, is hampered by a lack of multi-turn medical dialog corpus. We make the very first attempt to release a high-quality multi-turn Medical Dialog dataset relating to Covid-19 disease named CDialog, with over 1K conversations collected from the online medical counselling websites. We annotate each utterance of the conversation with seven different categories of medical entities, including diseases, symptoms, medical tests, medical history, remedies, medications and other aspects as additional labels. Finally, we propose a novel neural medical dialog system based on the CDialog dataset to advance future research on developing automated medical dialog systems. We use pre-trained language models for dialogue generation, incorporating annotated medical entities, to generate a virtual doctor's response that addresses the patient's query. Experimental results show that the proposed dialog models perform comparably better when supplemented with entity information and hence can improve the response quality.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"73 1","pages":"11373-11385"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2212.06049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The development of conversational agents to interact with patients and deliver clinical advice has attracted the interest of many researchers, particularly in light of the COVID-19 pandemic. The training of an end-to-end neural based dialog system, on the other hand, is hampered by a lack of multi-turn medical dialog corpus. We make the very first attempt to release a high-quality multi-turn Medical Dialog dataset relating to Covid-19 disease named CDialog, with over 1K conversations collected from the online medical counselling websites. We annotate each utterance of the conversation with seven different categories of medical entities, including diseases, symptoms, medical tests, medical history, remedies, medications and other aspects as additional labels. Finally, we propose a novel neural medical dialog system based on the CDialog dataset to advance future research on developing automated medical dialog systems. We use pre-trained language models for dialogue generation, incorporating annotated medical entities, to generate a virtual doctor's response that addresses the patient's query. Experimental results show that the proposed dialog models perform comparably better when supplemented with entity information and hence can improve the response quality.

查看原文本刊更多论文

CDialog:用于实体感知对话生成的多回合Covid-19会话数据集

开发与患者互动并提供临床建议的对话代理吸引了许多研究人员的兴趣，特别是在COVID-19大流行的背景下。另一方面，由于缺乏多回合医学对话语料库，端到端神经对话系统的训练受到阻碍。我们首次尝试发布与Covid-19疾病相关的高质量多回合医疗对话数据集，名为CDialog，从在线医疗咨询网站收集了超过1K的对话。我们用七种不同类别的医疗实体来注释对话的每一句话，包括疾病、症状、医学检查、病史、补救措施、药物和其他方面，作为附加标签。最后，我们提出了一种新的基于CDialog数据集的神经医学对话系统，以推进自动化医学对话系统的未来研究。我们使用预先训练的语言模型进行对话生成，结合注释的医疗实体，生成虚拟医生的响应，解决患者的查询。实验结果表明，在补充实体信息的情况下，所提出的对话模型具有较好的性能，从而提高了响应质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

自引率

0.00%

发文量