Leveraging large language models to mimic domain expert labeling in unstructured text-based electronic healthcare records in non-english languages.

IF 3.3 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making Pub Date : 2025-03-31 DOI:10.1186/s12911-025-02871-6

Izzet Turkalp Akbasli, Ahmet Ziya Birbilen, Ozlem Teksam

{"title":"Leveraging large language models to mimic domain expert labeling in unstructured text-based electronic healthcare records in non-english languages.","authors":"Izzet Turkalp Akbasli, Ahmet Ziya Birbilen, Ozlem Teksam","doi":"10.1186/s12911-025-02871-6","DOIUrl":null,"url":null,"abstract":"Background: The integration of big data and artificial intelligence (AI) in healthcare, particularly through the analysis of electronic health records (EHR), presents significant opportunities for improving diagnostic accuracy and patient outcomes. However, the challenge of processing and accurately labeling vast amounts of unstructured data remains a critical bottleneck, necessitating efficient and reliable solutions. This study investigates the ability of domain specific, fine-tuned large language models (LLMs) to classify unstructured EHR texts with typographical errors through named entity recognition tasks, aiming to improve the efficiency and reliability of supervised learning AI models in healthcare.Methods: Turkish clinical notes from pediatric emergency room admissions at Hacettepe University İhsan Doğramacı Children's Hospital from 2018 to 2023 were analyzed. The data were preprocessed with open source Python libraries and categorized using a pretrained GPT-3 model, \"text-davinci-003,\" before and after fine-tuning with domain-specific data on respiratory tract infections (RTI). The model's predictions were compared against ground truth labels established by pediatric specialists.Results: Out of 24,229 patient records classified as poorly labeled, 18,879 were identified without typographical errors and confirmed for RTI through filtering methods. The fine-tuned model achieved a 99.88% accuracy, significantly outperforming the pretrained model's 78.54% accuracy in identifying RTI cases among the remaining records. The fine-tuned model demonstrated superior performance metrics across all evaluated aspects compared to the pretrained model.Conclusions: Fine-tuned LLMs can categorize unstructured EHR data with high accuracy, closely approximating the performance of domain experts. This approach significantly reduces the time and costs associated with manual data labeling, demonstrating the potential to streamline the processing of large-scale healthcare data for AI applications.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"154"},"PeriodicalIF":3.3000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11959812/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-02871-6","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The integration of big data and artificial intelligence (AI) in healthcare, particularly through the analysis of electronic health records (EHR), presents significant opportunities for improving diagnostic accuracy and patient outcomes. However, the challenge of processing and accurately labeling vast amounts of unstructured data remains a critical bottleneck, necessitating efficient and reliable solutions. This study investigates the ability of domain specific, fine-tuned large language models (LLMs) to classify unstructured EHR texts with typographical errors through named entity recognition tasks, aiming to improve the efficiency and reliability of supervised learning AI models in healthcare.

Methods: Turkish clinical notes from pediatric emergency room admissions at Hacettepe University İhsan Doğramacı Children's Hospital from 2018 to 2023 were analyzed. The data were preprocessed with open source Python libraries and categorized using a pretrained GPT-3 model, "text-davinci-003," before and after fine-tuning with domain-specific data on respiratory tract infections (RTI). The model's predictions were compared against ground truth labels established by pediatric specialists.

Results: Out of 24,229 patient records classified as poorly labeled, 18,879 were identified without typographical errors and confirmed for RTI through filtering methods. The fine-tuned model achieved a 99.88% accuracy, significantly outperforming the pretrained model's 78.54% accuracy in identifying RTI cases among the remaining records. The fine-tuned model demonstrated superior performance metrics across all evaluated aspects compared to the pretrained model.

Conclusions: Fine-tuned LLMs can categorize unstructured EHR data with high accuracy, closely approximating the performance of domain experts. This approach significantly reduces the time and costs associated with manual data labeling, demonstrating the potential to streamline the processing of large-scale healthcare data for AI applications.

查看原文本刊更多论文

利用大型语言模型在非英语语言的基于文本的非结构化电子医疗记录中模拟领域专家标记。

背景：大数据和人工智能（AI）在医疗保健领域的整合，特别是通过对电子健康记录（EHR）的分析，为提高诊断准确性和患者预后提供了重要机会。然而，处理和准确标记大量非结构化数据的挑战仍然是一个关键的瓶颈，需要高效可靠的解决方案。本研究探讨了特定领域、微调大型语言模型（llm）通过命名实体识别任务对带有排版错误的非结构化EHR文本进行分类的能力，旨在提高医疗保健领域监督学习人工智能模型的效率和可靠性。方法：分析2018年至2023年Hacettepe大学İhsan Doğramacı儿童医院儿科急诊室住院的土耳其语临床记录。使用开源Python库对数据进行预处理，并使用预训练的GPT-3模型“text- davincii -003”对数据进行分类，前后使用呼吸道感染（RTI）的特定领域数据进行微调。该模型的预测与儿科专家建立的真实标签进行了比较。结果：在24229例标记不良的患者记录中，18879例没有排版错误，并通过过滤方法确认为RTI。在剩余记录中识别RTI病例时，微调模型的准确率达到99.88%，显著优于预训练模型的78.54%。与预训练模型相比，经过微调的模型在所有评估方面都显示出优越的性能指标。结论：微调后的llm能够以较高的准确率对非结构化电子病历数据进行分类，接近领域专家的表现。这种方法大大减少了与手动数据标记相关的时间和成本，展示了为人工智能应用程序简化大规模医疗保健数据处理的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Informatics and Decision Making 医学-医学：信息

CiteScore

7.20

自引率

5.70%

发文量

297

审稿时长

1 months

期刊介绍： BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.