1型神经纤维瘤病临床记录中基于规则和大型语言模型的表型提取的比较

IF 4.6 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association Pub Date : 2025-09-12 DOI:10.1093/jamia/ocaf155

Levi Kaster, Ethan Hillis, Inez Y Oh, Elizabeth C Cordell, Randi E Foraker, Albert M Lai, Stephanie M Morris, David H Gutmann, Philip R O Payne, Aditi Gupta

{"title":"1型神经纤维瘤病临床记录中基于规则和大型语言模型的表型提取的比较","authors":"Levi Kaster, Ethan Hillis, Inez Y Oh, Elizabeth C Cordell, Randi E Foraker, Albert M Lai, Stephanie M Morris, David H Gutmann, Philip R O Payne, Aditi Gupta","doi":"10.1093/jamia/ocaf155","DOIUrl":null,"url":null,"abstract":"Introduction: Neurofibromatosis type 1 (NF1) is a rare genetic disorder affecting multiple organ systems with significant clinical heterogeneity. Managing individuals with NF1 is challenging due to variability in disease progression and outcomes and limited early risk assessment tools.Objective: This study aims to develop an effective, generalizable, user-friendly clinical entity extraction pipeline for identifying NF1-related phenotypes from unstructured clinical notes to enhance research and risk-modeling efforts. We compare the benefits of rule-based natural language processing (NLP) vs large language models (LLMs) for this purpose.Materials and methods: Four phenotype extraction pipelines (3 LLM-based vs 1 rule-based) were developed to automatically extract selected NF1-relevant phenotypes. Subject matter experts manually reviewed clinical notes, generating a gold-standard annotation dataset for evaluation. In Phase 1, notes authored by a single NF1 physician were used to guide pipeline development and refinement. In Phase 2, notes from a second NF1 physician were used to assess pipeline generalizability, followed by further refinement to accommodate differences in physician terminology.Results: With refinement, the rule-based model had higher distributions of F1 scores than the LLMs in both Phase 1 and Phase 2. However, the LLMs demonstrated better generalizability between physicians without refinement, showing lesser performance decreases (4.4%-5.1%) when transitioning from Phase 1 to Phase 2 without refinement, compared to an 8.8% decrease for the rule-based model.Conclusion: We highlight trade-offs between the effectiveness of rule-based NLP vs generalizability and ease of implementation of LLMs for clinical entity extraction, with implications for pipeline portability across providers and institutions.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of rule- and large language model-based phenotype extraction from clinical notes for neurofibromatosis type 1.\",\"authors\":\"Levi Kaster, Ethan Hillis, Inez Y Oh, Elizabeth C Cordell, Randi E Foraker, Albert M Lai, Stephanie M Morris, David H Gutmann, Philip R O Payne, Aditi Gupta\",\"doi\":\"10.1093/jamia/ocaf155\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction: Neurofibromatosis type 1 (NF1) is a rare genetic disorder affecting multiple organ systems with significant clinical heterogeneity. Managing individuals with NF1 is challenging due to variability in disease progression and outcomes and limited early risk assessment tools.Objective: This study aims to develop an effective, generalizable, user-friendly clinical entity extraction pipeline for identifying NF1-related phenotypes from unstructured clinical notes to enhance research and risk-modeling efforts. We compare the benefits of rule-based natural language processing (NLP) vs large language models (LLMs) for this purpose.Materials and methods: Four phenotype extraction pipelines (3 LLM-based vs 1 rule-based) were developed to automatically extract selected NF1-relevant phenotypes. Subject matter experts manually reviewed clinical notes, generating a gold-standard annotation dataset for evaluation. In Phase 1, notes authored by a single NF1 physician were used to guide pipeline development and refinement. In Phase 2, notes from a second NF1 physician were used to assess pipeline generalizability, followed by further refinement to accommodate differences in physician terminology.Results: With refinement, the rule-based model had higher distributions of F1 scores than the LLMs in both Phase 1 and Phase 2. However, the LLMs demonstrated better generalizability between physicians without refinement, showing lesser performance decreases (4.4%-5.1%) when transitioning from Phase 1 to Phase 2 without refinement, compared to an 8.8% decrease for the rule-based model.Conclusion: We highlight trade-offs between the effectiveness of rule-based NLP vs generalizability and ease of implementation of LLMs for clinical entity extraction, with implications for pipeline portability across providers and institutions.\",\"PeriodicalId\":50016,\"journal\":{\"name\":\"Journal of the American Medical Informatics Association\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Medical Informatics Association\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.1093/jamia/ocaf155\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocaf155","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

1型神经纤维瘤病（NF1）是一种罕见的影响多器官系统的遗传性疾病，具有显著的临床异质性。由于疾病进展和结局的可变性以及有限的早期风险评估工具，NF1患者的管理具有挑战性。目的：本研究旨在开发一种有效、通用、用户友好的临床实体提取管道，用于从非结构化临床记录中识别nf1相关表型，以加强研究和风险建模工作。为此，我们比较了基于规则的自然语言处理（NLP）与大型语言模型（llm）的优势。材料和方法：开发了4个表型提取管道（3个基于llm的和1个基于规则的）来自动提取选定的nf1相关表型。主题专家手动审查临床记录，生成用于评估的金标准注释数据集。在第一阶段，由一名NF1医生撰写的笔记用于指导管道的开发和改进。在第二阶段，来自第二名NF1医生的记录用于评估管道的通用性，随后进一步改进以适应医生术语的差异。结果：经过细化，基于规则的模型在第一阶段和第二阶段的F1分数分布都高于LLMs。然而，未经改进的llm在医生之间表现出更好的通用性，在从阶段1过渡到阶段2时表现出较小的性能下降（4.4%-5.1%），而基于规则的模型下降了8.8%。结论：我们强调了基于规则的NLP的有效性与临床实体提取的通用性之间的权衡，以及llm实施的便利性，这对提供者和机构之间的管道可移植性具有重要意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparison of rule- and large language model-based phenotype extraction from clinical notes for neurofibromatosis type 1.

Introduction: Neurofibromatosis type 1 (NF1) is a rare genetic disorder affecting multiple organ systems with significant clinical heterogeneity. Managing individuals with NF1 is challenging due to variability in disease progression and outcomes and limited early risk assessment tools.

Objective: This study aims to develop an effective, generalizable, user-friendly clinical entity extraction pipeline for identifying NF1-related phenotypes from unstructured clinical notes to enhance research and risk-modeling efforts. We compare the benefits of rule-based natural language processing (NLP) vs large language models (LLMs) for this purpose.

Materials and methods: Four phenotype extraction pipelines (3 LLM-based vs 1 rule-based) were developed to automatically extract selected NF1-relevant phenotypes. Subject matter experts manually reviewed clinical notes, generating a gold-standard annotation dataset for evaluation. In Phase 1, notes authored by a single NF1 physician were used to guide pipeline development and refinement. In Phase 2, notes from a second NF1 physician were used to assess pipeline generalizability, followed by further refinement to accommodate differences in physician terminology.

Results: With refinement, the rule-based model had higher distributions of F1 scores than the LLMs in both Phase 1 and Phase 2. However, the LLMs demonstrated better generalizability between physicians without refinement, showing lesser performance decreases (4.4%-5.1%) when transitioning from Phase 1 to Phase 2 without refinement, compared to an 8.8% decrease for the rule-based model.

Conclusion: We highlight trade-offs between the effectiveness of rule-based NLP vs generalizability and ease of implementation of LLMs for clinical entity extraction, with implications for pipeline portability across providers and institutions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the American Medical Informatics Association 医学-计算机：跨学科应用

CiteScore

14.50

自引率

7.80%

发文量

230

审稿时长

3-8 weeks

期刊介绍： JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.