A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19

Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics Pub Date : 2020-06-19 DOI:10.1145/3388440.3412413

David Oniani, Yanshan Wang

{"title":"A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19","authors":"David Oniani, Yanshan Wang","doi":"10.1145/3388440.3412413","DOIUrl":null,"url":null,"abstract":"COVID-19 (2019 Novel Coronavirus) has resulted in an ongoing pandemic and as of 26 July 2020, has caused more than 15.7 million cases and over 640,000 deaths. The highly dynamic and rapidly evolving situation with COVID-19 has made it difficult to access accurate, on-demand information regarding the disease. Online communities, forums, and social media provide potential venues to search for relevant questions and answers, or post questions and seek answers from other members. However, due to the nature of such sites, there are always a limited number of relevant questions and responses to search from, and posted questions are rarely answered immediately. With the advancements in the field of natural language processing, particularly in the domain of language models, it has become possible to design chatbots that can automatically answer consumer questions. However, such models are rarely applied and evaluated in the healthcare domain, to meet the information needs with accurate and up-to-date healthcare data. In this paper, we propose to apply a language model for automatically answering questions related to COVID-19 and qualitatively evaluate the generated responses. We utilized the GPT-2 language model and applied transfer learning to retrain it on the COVID-19 Open Research Dataset (CORD-19) corpus. In order to improve the quality of the generated responses, we applied 4 different approaches, namely tf-idf (Term Frequency - Inverse Document Frequency), Bidirectional Encoder Representations from Transformers (BERT), Bidirectional Encoder Representations from Transformers for Biomedical Text Mining (BioBERT), and Universal Sentence Encoder (USE) to filter and retain relevant sentences in the responses. In the performance evaluation step, we asked two medical experts to rate the responses. We found that BERT and BioBERT, on average, outperform both tf-idf and USE in relevance-based sentence filtering tasks. Additionally, based on the chatbot, we created a user-friendly interactive web application to be hosted online and made its source code available free of charge to anyone interested in running it locally, online, or just for experimental purposes. Overall, our work has yielded significant results in both designing a chatbot that produces high-quality responses to COVID-19-related questions and comparing several embedding generation techniques.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388440.3412413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

Abstract

COVID-19 (2019 Novel Coronavirus) has resulted in an ongoing pandemic and as of 26 July 2020, has caused more than 15.7 million cases and over 640,000 deaths. The highly dynamic and rapidly evolving situation with COVID-19 has made it difficult to access accurate, on-demand information regarding the disease. Online communities, forums, and social media provide potential venues to search for relevant questions and answers, or post questions and seek answers from other members. However, due to the nature of such sites, there are always a limited number of relevant questions and responses to search from, and posted questions are rarely answered immediately. With the advancements in the field of natural language processing, particularly in the domain of language models, it has become possible to design chatbots that can automatically answer consumer questions. However, such models are rarely applied and evaluated in the healthcare domain, to meet the information needs with accurate and up-to-date healthcare data. In this paper, we propose to apply a language model for automatically answering questions related to COVID-19 and qualitatively evaluate the generated responses. We utilized the GPT-2 language model and applied transfer learning to retrain it on the COVID-19 Open Research Dataset (CORD-19) corpus. In order to improve the quality of the generated responses, we applied 4 different approaches, namely tf-idf (Term Frequency - Inverse Document Frequency), Bidirectional Encoder Representations from Transformers (BERT), Bidirectional Encoder Representations from Transformers for Biomedical Text Mining (BioBERT), and Universal Sentence Encoder (USE) to filter and retain relevant sentences in the responses. In the performance evaluation step, we asked two medical experts to rate the responses. We found that BERT and BioBERT, on average, outperform both tf-idf and USE in relevance-based sentence filtering tasks. Additionally, based on the chatbot, we created a user-friendly interactive web application to be hosted online and made its source code available free of charge to anyone interested in running it locally, online, or just for experimental purposes. Overall, our work has yielded significant results in both designing a chatbot that produces high-quality responses to COVID-19-related questions and comparing several embedding generation techniques.

查看原文本刊更多论文

新型冠状病毒肺炎自动问答语言模型的定性评价

COVID-19(2019年新型冠状病毒)导致了一场持续的大流行，截至2020年7月26日，已造成1570多万例病例和64多万例死亡。COVID-19形势的高度动态和快速演变使得很难获得有关该疾病的准确、按需信息。在线社区、论坛和社交媒体提供了搜索相关问题和答案的潜在场所，或者发布问题并从其他成员那里寻求答案。然而，由于此类网站的性质，可供搜索的相关问题和回答总是有限的，发布的问题很少能立即得到回答。随着自然语言处理领域，特别是语言模型领域的进步，设计能够自动回答消费者问题的聊天机器人已经成为可能。然而，这些模型很少在医疗保健领域应用和评估，以满足准确和最新的医疗保健数据的信息需求。在本文中，我们建议应用一种语言模型来自动回答与COVID-19相关的问题，并对生成的回答进行定性评估。我们利用GPT-2语言模型，并应用迁移学习在COVID-19开放研究数据集(CORD-19)语料库上对其进行再训练。为了提高生成的响应的质量，我们采用了4种不同的方法，即tf-idf (Term Frequency - Inverse Document Frequency)、BERT(双向编码器表示)、BioBERT(双向编码器表示)和USE(通用句子编码器)来过滤和保留响应中的相关句子。在绩效评估步骤中，我们请了两位医学专家对回答进行评分。我们发现，在基于相关性的句子过滤任务中，BERT和BioBERT的平均表现优于tf-idf和USE。此外，基于聊天机器人，我们创建了一个在线托管的用户友好的交互式web应用程序，并将其源代码免费提供给任何有兴趣在本地、在线或仅用于实验目的运行它的人。总的来说，我们的工作在设计一个对covid -19相关问题产生高质量响应的聊天机器人和比较几种嵌入生成技术方面取得了重大成果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

自引率

0.00%

发文量