Disambiguating Clinical Abbreviations Using a One-Fits-All Classifier Based on Deep Learning Techniques.

IF 1.8 4区医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Methods of Information in Medicine Pub Date : 2022-06-01 Epub Date: 2022-02-01 DOI:10.1055/s-0042-1742388

Areej Jaber, Paloma Martínez

{"title":"Disambiguating Clinical Abbreviations Using a One-Fits-All Classifier Based on Deep Learning Techniques.","authors":"Areej Jaber, Paloma Martínez","doi":"10.1055/s-0042-1742388","DOIUrl":null,"url":null,"abstract":"Background: Abbreviations are considered an essential part of the clinical narrative; they are used not only to save time and space but also to hide serious or incurable illnesses. Misreckoning interpretation of the clinical abbreviations could affect different aspects concerning patients themselves or other services like clinical support systems. There is no consensus in the scientific community to create new abbreviations, making it difficult to understand them. Disambiguate clinical abbreviations aim to predict the exact meaning of the abbreviation based on context, a crucial step in understanding clinical notes.Objectives: Disambiguating clinical abbreviations is an essential task in information extraction from medical texts. Deep contextualized representations models showed promising results in most word sense disambiguation tasks. In this work, we propose a one-fits-all classifier to disambiguate clinical abbreviations with deep contextualized representation from pretrained language models like Bidirectional Encoder Representation from Transformers (BERT).Methods: A set of experiments with different pretrained clinical BERT models were performed to investigate fine-tuning methods on the disambiguation of clinical abbreviations. One-fits-all classifiers were used to improve disambiguating rare clinical abbreviations.Results: One-fits-all classifiers with deep contextualized representations from Bioclinical, BlueBERT, and MS_BERT pretrained models improved the accuracy using the University of Minnesota data set. The model achieved 98.99, 98.75, and 99.13%, respectively. All the models outperform the state-of-the-art in the previous work of around 98.39%, with the best accuracy using the MS_BERT model.Conclusion: Deep contextualized representations via fine-tuning of pretrained language modeling proved its sufficiency on disambiguating clinical abbreviations; it could be robust for rare and unseen abbreviations and has the advantage of avoiding building a separate classifier for each abbreviation. Transfer learning can improve the development of practical abbreviation disambiguation systems.","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"e28-e34"},"PeriodicalIF":1.8000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/79/7a/10-1055-s-0042-1742388.PMC9246508.pdf","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods of Information in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/s-0042-1742388","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/2/1 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 9

Abstract

Background: Abbreviations are considered an essential part of the clinical narrative; they are used not only to save time and space but also to hide serious or incurable illnesses. Misreckoning interpretation of the clinical abbreviations could affect different aspects concerning patients themselves or other services like clinical support systems. There is no consensus in the scientific community to create new abbreviations, making it difficult to understand them. Disambiguate clinical abbreviations aim to predict the exact meaning of the abbreviation based on context, a crucial step in understanding clinical notes.

Objectives: Disambiguating clinical abbreviations is an essential task in information extraction from medical texts. Deep contextualized representations models showed promising results in most word sense disambiguation tasks. In this work, we propose a one-fits-all classifier to disambiguate clinical abbreviations with deep contextualized representation from pretrained language models like Bidirectional Encoder Representation from Transformers (BERT).

Methods: A set of experiments with different pretrained clinical BERT models were performed to investigate fine-tuning methods on the disambiguation of clinical abbreviations. One-fits-all classifiers were used to improve disambiguating rare clinical abbreviations.

Results: One-fits-all classifiers with deep contextualized representations from Bioclinical, BlueBERT, and MS_BERT pretrained models improved the accuracy using the University of Minnesota data set. The model achieved 98.99, 98.75, and 99.13%, respectively. All the models outperform the state-of-the-art in the previous work of around 98.39%, with the best accuracy using the MS_BERT model.

Conclusion: Deep contextualized representations via fine-tuning of pretrained language modeling proved its sufficiency on disambiguating clinical abbreviations; it could be robust for rare and unseen abbreviations and has the advantage of avoiding building a separate classifier for each abbreviation. Transfer learning can improve the development of practical abbreviation disambiguation systems.

Abstract Image

查看原文本刊更多论文

基于深度学习技术的一刀切分类器临床缩略语消歧。

背景:缩略语被认为是临床叙述的重要组成部分;它们不仅用于节省时间和空间，而且还用于隐藏严重或无法治愈的疾病。对临床缩略语的误判解释可能会影响到患者自身或其他服务如临床支持系统的不同方面。科学界对于创造新的缩略语没有达成共识，这使得人们很难理解它们。消歧临床缩略语旨在根据上下文预测缩略语的确切含义，这是理解临床笔记的关键一步。目的:临床缩略语消歧是医学文献信息提取的重要环节。深度语境化表示模型在大多数词义消歧任务中显示出令人满意的结果。在这项工作中，我们提出了一个一刀切的分类器，用于从预训练的语言模型(如Transformers的双向编码器表示(BERT))中消除具有深度上下文化表示的临床缩写的歧义。方法:采用不同的临床预训练BERT模型，研究临床缩略语消歧的微调方法。一种适合所有的分类器被用来改善消除歧义罕见的临床缩写。结果:使用明尼苏达大学的数据集，使用来自Bioclinical, BlueBERT和MS_BERT预训练模型的深度上下文化表示的one -fit all分类器提高了准确性。模型的准确率分别为98.99、98.75和99.13%。所有模型的准确率都超过了之前研究中最先进的98.39%，其中使用MS_BERT模型的准确率最高。结论:通过对预训练语言模型进行微调的深度语境化表征在临床缩略语消歧方面是充分的;对于罕见的和不可见的缩写，它可能是健壮的，并且具有避免为每个缩写构建单独的分类器的优点。迁移学习可以促进实用缩写消歧系统的开发。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Methods of Information in Medicine 医学-计算机：信息系统

CiteScore

3.70

自引率

11.80%

发文量

审稿时长

6-12 weeks

期刊介绍： Good medicine and good healthcare demand good information. Since the journal''s founding in 1962, Methods of Information in Medicine has stressed the methodology and scientific fundamentals of organizing, representing and analyzing data, information and knowledge in biomedicine and health care. Covering publications in the fields of biomedical and health informatics, medical biometry, and epidemiology, the journal publishes original papers, reviews, reports, opinion papers, editorials, and letters to the editor. From time to time, the journal publishes articles on particular focus themes as part of a journal''s issue.