MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining

Clinical Natural Language Processing Workshop Pub Date : 2020-11-01 DOI:10.18653/v1/2020.clinicalnlp-1.15

Zhi Wen, Xing Han Lu, Siva Reddy

引用次数: 21

Abstract

One of the biggest challenges that prohibit the use of many current NLP methods in clinical settings is the availability of public datasets. In this work, we present MeDAL, a large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain. We pre-trained several models of common architectures on this dataset and empirically showed that such pre-training leads to improved performance and convergence speed when fine-tuning on downstream medical tasks.

查看原文本刊更多论文

奖章:用于自然语言理解预训练的医学缩写消歧数据集

禁止在临床环境中使用许多当前NLP方法的最大挑战之一是公共数据集的可用性。在这项工作中，我们提出了MeDAL，这是一个大型医学文本数据集，专为医学领域的自然语言理解预训练而设计，用于缩写消歧。我们在该数据集上预训练了几个常见架构的模型，并通过经验表明，这种预训练可以提高下游医疗任务微调时的性能和收敛速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Clinical Natural Language Processing Workshop

自引率

0.00%

发文量