奖章:用于自然语言理解预训练的医学缩写消歧数据集

Clinical Natural Language Processing Workshop Pub Date : 2020-11-01 DOI:10.18653/v1/2020.clinicalnlp-1.15

Zhi Wen, Xing Han Lu, Siva Reddy

{"title":"奖章:用于自然语言理解预训练的医学缩写消歧数据集","authors":"Zhi Wen, Xing Han Lu, Siva Reddy","doi":"10.18653/v1/2020.clinicalnlp-1.15","DOIUrl":null,"url":null,"abstract":"One of the biggest challenges that prohibit the use of many current NLP methods in clinical settings is the availability of public datasets. In this work, we present MeDAL, a large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain. We pre-trained several models of common architectures on this dataset and empirically showed that such pre-training leads to improved performance and convergence speed when fine-tuning on downstream medical tasks.","PeriodicalId":216954,"journal":{"name":"Clinical Natural Language Processing Workshop","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining\",\"authors\":\"Zhi Wen, Xing Han Lu, Siva Reddy\",\"doi\":\"10.18653/v1/2020.clinicalnlp-1.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the biggest challenges that prohibit the use of many current NLP methods in clinical settings is the availability of public datasets. In this work, we present MeDAL, a large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain. We pre-trained several models of common architectures on this dataset and empirically showed that such pre-training leads to improved performance and convergence speed when fine-tuning on downstream medical tasks.\",\"PeriodicalId\":216954,\"journal\":{\"name\":\"Clinical Natural Language Processing Workshop\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Natural Language Processing Workshop\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2020.clinicalnlp-1.15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Natural Language Processing Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2020.clinicalnlp-1.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

摘要

禁止在临床环境中使用许多当前NLP方法的最大挑战之一是公共数据集的可用性。在这项工作中，我们提出了MeDAL，这是一个大型医学文本数据集，专为医学领域的自然语言理解预训练而设计，用于缩写消歧。我们在该数据集上预训练了几个常见架构的模型，并通过经验表明，这种预训练可以提高下游医疗任务微调时的性能和收敛速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining

One of the biggest challenges that prohibit the use of many current NLP methods in clinical settings is the availability of public datasets. In this work, we present MeDAL, a large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain. We pre-trained several models of common architectures on this dataset and empirically showed that such pre-training leads to improved performance and convergence speed when fine-tuning on downstream medical tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Clinical Natural Language Processing Workshop

自引率

0.00%

发文量