NLP modeling recommendations for restricted data availability in clinical settings.

IF 3.3 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making Pub Date : 2025-03-07 DOI:10.1186/s12911-025-02948-2

Fabián Villena, Felipe Bravo-Marquez, Jocelyn Dunstan

{"title":"NLP modeling recommendations for restricted data availability in clinical settings.","authors":"Fabián Villena, Felipe Bravo-Marquez, Jocelyn Dunstan","doi":"10.1186/s12911-025-02948-2","DOIUrl":null,"url":null,"abstract":"Background: Clinical decision-making in healthcare often relies on unstructured text data, which can be challenging to analyze using traditional methods. Natural Language Processing (NLP) has emerged as a promising solution, but its application in clinical settings is hindered by restricted data availability and the need for domain-specific knowledge.Methods: We conducted an experimental analysis to evaluate the performance of various NLP modeling paradigms on multiple clinical NLP tasks in Spanish. These tasks included referral prioritization and referral specialty classification. We simulated three clinical settings with varying levels of data availability and evaluated the performance of four foundation models.Results: Clinical-specific pre-trained language models (PLMs) achieved the highest performance across tasks. For referral prioritization, Clinical PLMs attained an 88.85 % macro F1 score when fine-tuned. In referral specialty classification, the same models achieved a 53.79 % macro F1 score, surpassing domain-agnostic models. Continuing pre-training with environment-specific data improved model performance, but the gains were marginal compared to the computational resources required. Few-shot learning with large language models (LLMs) demonstrated lower performance but showed potential in data-scarce scenarios.Conclusions: Our study provides evidence-based recommendations for clinical NLP practitioners on selecting modeling paradigms based on data availability. We highlight the importance of considering data availability, task complexity, and institutional maturity when designing and training clinical NLP models. Our findings can inform the development of effective clinical NLP solutions in real-world settings.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"116"},"PeriodicalIF":3.3000,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11889813/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-02948-2","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Clinical decision-making in healthcare often relies on unstructured text data, which can be challenging to analyze using traditional methods. Natural Language Processing (NLP) has emerged as a promising solution, but its application in clinical settings is hindered by restricted data availability and the need for domain-specific knowledge.

Methods: We conducted an experimental analysis to evaluate the performance of various NLP modeling paradigms on multiple clinical NLP tasks in Spanish. These tasks included referral prioritization and referral specialty classification. We simulated three clinical settings with varying levels of data availability and evaluated the performance of four foundation models.

Results: Clinical-specific pre-trained language models (PLMs) achieved the highest performance across tasks. For referral prioritization, Clinical PLMs attained an 88.85 % macro F1 score when fine-tuned. In referral specialty classification, the same models achieved a 53.79 % macro F1 score, surpassing domain-agnostic models. Continuing pre-training with environment-specific data improved model performance, but the gains were marginal compared to the computational resources required. Few-shot learning with large language models (LLMs) demonstrated lower performance but showed potential in data-scarce scenarios.

Conclusions: Our study provides evidence-based recommendations for clinical NLP practitioners on selecting modeling paradigms based on data availability. We highlight the importance of considering data availability, task complexity, and institutional maturity when designing and training clinical NLP models. Our findings can inform the development of effective clinical NLP solutions in real-world settings.

查看原文本刊更多论文

NLP建模建议限制数据可用性在临床设置。

背景：医疗保健中的临床决策通常依赖于非结构化文本数据，使用传统方法分析这些数据可能具有挑战性。自然语言处理（NLP）已成为一种很有前途的解决方案，但其在临床环境中的应用受到有限的数据可用性和对特定领域知识的需求的阻碍。方法：通过实验分析，评估各种NLP建模范式在西班牙语多项临床NLP任务中的表现。这些任务包括转诊优先级和转诊专业分类。我们模拟了三种具有不同数据可用性水平的临床环境，并评估了四种基础模型的性能。结果：临床特异性预训练语言模型（PLMs）在跨任务中取得了最高的性能。对于转诊优先级，临床PLMs在微调后达到88.85%的宏观F1评分。在转诊专科分类中，相同模型的宏观F1得分达到53.79%，超过了领域不可知模型。继续使用特定于环境的数据进行预训练可以提高模型的性能，但与所需的计算资源相比，这种增益是微不足道的。使用大型语言模型（llm）的少量学习表现出较低的性能，但在数据稀缺的场景中显示出潜力。结论：我们的研究为临床NLP从业者提供了基于数据可用性选择建模范式的循证建议。我们强调在设计和训练临床NLP模型时考虑数据可用性、任务复杂性和制度成熟度的重要性。我们的研究结果可以为现实世界中有效的临床NLP解决方案的发展提供信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Informatics and Decision Making 医学-医学：信息

CiteScore

7.20

自引率

5.70%

发文量

297

审稿时长

1 months

期刊介绍： BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.