A comparison of few-shot and traditional named entity recognition models for medical text.

Yao Ge, Yuting Guo, Yuan-Chi Yang, Mohammed Ali Al-Garadi, Abeed Sarker
{"title":"A comparison of few-shot and traditional named entity recognition models for medical text.","authors":"Yao Ge,&nbsp;Yuting Guo,&nbsp;Yuan-Chi Yang,&nbsp;Mohammed Ali Al-Garadi,&nbsp;Abeed Sarker","doi":"10.1109/ichi54592.2022.00024","DOIUrl":null,"url":null,"abstract":"<p><p>Many research problems involving medical texts have limited amounts of annotated data available (<i>e.g</i>., expressions of rare diseases). Traditional supervised machine learning algorithms, particularly those based on deep neural networks, require large volumes of annotated data, and they underperform when only small amounts of labeled data are available. Few-shot learning (FSL) is a category of machine learning models that are designed with the intent of solving problems that have small annotated datasets available. However, there is no current study that compares the performances of FSL models with traditional models (<i>e.g</i>., conditional random fields) for medical text at different training set sizes. In this paper, we attempted to fill this gap in research by comparing multiple FSL models with traditional models for the task of named entity recognition (NER) from medical texts. Using five health-related annotated NER datasets, we benchmarked three traditional NER models based on BERT-BERT-Linear Classifier (BLC), BERT-CRF (BC) and SANER; and three FSL NER models-StructShot & NNShot, Few-Shot Slot Tagging (FS-ST) and ProtoNER. Our benchmarking results show that almost all models, whether traditional or FSL, achieve significantly lower performances compared to the state-of-the-art with small amounts of training data. For the NER experiments we executed, the F<sub>1</sub>-scores were very low with small training sets, typically below 30%. FSL models that were reported to perform well on non-medical texts significantly underperformed, compared to their reported best, on medical texts. Our experiments also suggest that FSL methods tend to perform worse on data sets from noisy sources of medical texts, such as social media (which includes misspellings and colloquial expressions), compared to less noisy sources such as medical literature. Our experiments demonstrate that the current state-of-the-art FSL systems are not yet suitable for effective NER in medical natural language processing tasks, and further research needs to be carried out to improve their performances. Creation of specialized, standardized datasets replicating real-world scenarios may help to move this category of methods forward.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10462421/pdf/nihms-1926966.pdf","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ichi54592.2022.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Many research problems involving medical texts have limited amounts of annotated data available (e.g., expressions of rare diseases). Traditional supervised machine learning algorithms, particularly those based on deep neural networks, require large volumes of annotated data, and they underperform when only small amounts of labeled data are available. Few-shot learning (FSL) is a category of machine learning models that are designed with the intent of solving problems that have small annotated datasets available. However, there is no current study that compares the performances of FSL models with traditional models (e.g., conditional random fields) for medical text at different training set sizes. In this paper, we attempted to fill this gap in research by comparing multiple FSL models with traditional models for the task of named entity recognition (NER) from medical texts. Using five health-related annotated NER datasets, we benchmarked three traditional NER models based on BERT-BERT-Linear Classifier (BLC), BERT-CRF (BC) and SANER; and three FSL NER models-StructShot & NNShot, Few-Shot Slot Tagging (FS-ST) and ProtoNER. Our benchmarking results show that almost all models, whether traditional or FSL, achieve significantly lower performances compared to the state-of-the-art with small amounts of training data. For the NER experiments we executed, the F1-scores were very low with small training sets, typically below 30%. FSL models that were reported to perform well on non-medical texts significantly underperformed, compared to their reported best, on medical texts. Our experiments also suggest that FSL methods tend to perform worse on data sets from noisy sources of medical texts, such as social media (which includes misspellings and colloquial expressions), compared to less noisy sources such as medical literature. Our experiments demonstrate that the current state-of-the-art FSL systems are not yet suitable for effective NER in medical natural language processing tasks, and further research needs to be carried out to improve their performances. Creation of specialized, standardized datasets replicating real-world scenarios may help to move this category of methods forward.

Abstract Image

医学文本少镜头与传统命名实体识别模型的比较。
许多涉及医学文本的研究问题的可用注释数据数量有限(例如,罕见疾病的表达)。传统的监督机器学习算法,特别是那些基于深度神经网络的算法,需要大量的标注数据,当只有少量的标记数据可用时,它们的表现不佳。FSL (Few-shot learning)是一类机器学习模型,其设计目的是解决具有小注释数据集的问题。然而,目前还没有研究将FSL模型与传统模型(如条件随机场)在不同训练集大小下的医学文本性能进行比较。在本文中,我们试图通过比较多个FSL模型与传统模型在医学文本命名实体识别(NER)任务上的差异来填补这一研究空白。利用5个与健康相关的注释NER数据集,我们对基于bert - bert线性分类器(BLC)、BERT-CRF (BC)和SANER的三种传统NER模型进行了基准测试;以及三个FSL NER模型- structshot & NNShot, Few-Shot Slot Tagging (FS-ST)和ProtoNER。我们的基准测试结果表明,与使用少量训练数据的最先进模型相比,几乎所有模型(无论是传统模型还是FSL模型)的性能都要低得多。对于我们执行的NER实验,f1分数在小训练集上非常低,通常低于30%。据报道,在非医学文本上表现良好的FSL模型,与在医学文本上表现最好的模型相比,表现明显不佳。我们的实验还表明,与医学文献等噪音较小的来源相比,FSL方法在嘈杂的医学文本来源(如社交媒体(包括拼写错误和口语化表达))的数据集上的表现往往更差。我们的实验表明,目前最先进的FSL系统还不适合有效的NER医学自然语言处理任务,需要进一步的研究来提高其性能。创建专门的、标准化的复制真实世界场景的数据集可能有助于推动这类方法的发展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信