AXpert: human expert facilitated privacy-preserving large language models for abdominal X-ray report labeling.

IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES

JAMIA Open Pub Date : 2025-02-10 eCollection Date: 2025-02-01 DOI:10.1093/jamiaopen/ooaf008

Yufeng Zhang, Joseph G Kohne, Katherine Webster, Rebecca Vartanian, Emily Wittrup, Kayvan Najarian

{"title":"AXpert: human expert facilitated privacy-preserving large language models for abdominal X-ray report labeling.","authors":"Yufeng Zhang, Joseph G Kohne, Katherine Webster, Rebecca Vartanian, Emily Wittrup, Kayvan Najarian","doi":"10.1093/jamiaopen/ooaf008","DOIUrl":null,"url":null,"abstract":"Importance: The lack of a publicly accessible abdominal X-ray (AXR) dataset has hindered necrotizing enterocolitis (NEC) research. While significant strides have been made in applying natural language processing (NLP) to radiology reports, most efforts have focused on chest radiology. Development of an accurate NLP model to identify features of NEC on abdominal radiograph can support efforts to improve diagnostic accuracy for this and other rare pediatric conditions.Objectives: This study aims to develop privacy-preserving large language models (LLMs) and their distilled version to efficiently annotate pediatric AXR reports.Materials and methods: Utilizing pediatric AXR reports collected from C.S. Mott Children's Hospital, we introduced AXpert in 2 formats: one based on the instruction-fine-tuned 7-B Gemma model, and a distilled version employing a BERT-based model derived from the fine-tuned model to improve inference and fine-tuning efficiency. AXpert aims to detect NEC presence and classify its subtypes-pneumatosis, portal venous gas, and free air.Results: Extensive testing shows that LLMs, including Axpert, outperforms baseline BERT models on all metrics. Specifically, Gemma-7B (F1 score: 0.9 ± 0.015) improves upon BlueBERT by 132% in F1 score for detecting NEC positive samples. The distilled BERT model matches the performance of the LLM labelers and surpasses expert-trained baseline BERT models.Discussion: Our findings highlight the potential of using LLMs for clinical NLP tasks. With minimal expert knowledge injections, LLMs can achieve human-like performance, greatly reducing manual labor. Privacy concerns are alleviated as all models are trained and deployed locally.Conclusion: AXpert demonstrates potential to reduce human labeling efforts while maintaining high accuracy in automating NEC diagnosis with AXR, offering precise image labeling capabilities.","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooaf008"},"PeriodicalIF":2.5000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11809431/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooaf008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Importance: The lack of a publicly accessible abdominal X-ray (AXR) dataset has hindered necrotizing enterocolitis (NEC) research. While significant strides have been made in applying natural language processing (NLP) to radiology reports, most efforts have focused on chest radiology. Development of an accurate NLP model to identify features of NEC on abdominal radiograph can support efforts to improve diagnostic accuracy for this and other rare pediatric conditions.

Objectives: This study aims to develop privacy-preserving large language models (LLMs) and their distilled version to efficiently annotate pediatric AXR reports.

Materials and methods: Utilizing pediatric AXR reports collected from C.S. Mott Children's Hospital, we introduced AXpert in 2 formats: one based on the instruction-fine-tuned 7-B Gemma model, and a distilled version employing a BERT-based model derived from the fine-tuned model to improve inference and fine-tuning efficiency. AXpert aims to detect NEC presence and classify its subtypes-pneumatosis, portal venous gas, and free air.

Results: Extensive testing shows that LLMs, including Axpert, outperforms baseline BERT models on all metrics. Specifically, Gemma-7B (F1 score: 0.9 ± 0.015) improves upon BlueBERT by 132% in F1 score for detecting NEC positive samples. The distilled BERT model matches the performance of the LLM labelers and surpasses expert-trained baseline BERT models.

Discussion: Our findings highlight the potential of using LLMs for clinical NLP tasks. With minimal expert knowledge injections, LLMs can achieve human-like performance, greatly reducing manual labor. Privacy concerns are alleviated as all models are trained and deployed locally.

Conclusion: AXpert demonstrates potential to reduce human labeling efforts while maintaining high accuracy in automating NEC diagnosis with AXR, offering precise image labeling capabilities.

查看原文本刊更多论文

专家：人类专家促进了腹部x射线报告标签的隐私保护大型语言模型。

重要性：缺乏可公开访问的腹部x射线（AXR）数据集阻碍了坏死性小肠结肠炎（NEC）的研究。虽然在将自然语言处理（NLP）应用于放射学报告方面取得了重大进展，但大多数努力都集中在胸部放射学上。开发一种准确的NLP模型来识别腹部x线片上NEC的特征，可以帮助提高对这种和其他罕见儿科疾病的诊断准确性。目的：本研究旨在开发保护隐私的大语言模型（llm）及其精炼版本，以有效地注释儿科AXR报告。材料和方法：利用C.S. Mott儿童医院收集的儿童AXR报告，我们以两种格式介绍了AXpert：一种是基于指令微调的7-B Gemma模型，一种是基于微调模型的基于bert的精炼版本，以提高推理和微调效率。expert的目的是检测NEC的存在并对其亚型进行分类-肺病，门静脉气体和自由空气。结果：广泛的测试表明，包括expert在内的llm在所有指标上都优于基线BERT模型。具体来说，Gemma-7B （F1得分：0.9±0.015）在检测NEC阳性样本的F1得分上比BlueBERT提高了132%。蒸馏后的BERT模型与LLM标记器的性能相匹配，并且超过了专家训练的基线BERT模型。讨论：我们的研究结果强调了llm用于临床NLP任务的潜力。通过最少的专家知识注入，llm可以实现类似人类的性能，大大减少了体力劳动。由于所有模型都是在本地训练和部署的，因此隐私问题得到了缓解。结论：AXpert展示了减少人工标记工作的潜力，同时在使用AXR自动化NEC诊断中保持高精度，提供精确的图像标记功能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊