Levi Kaster, Ethan Hillis, Inez Y Oh, Elizabeth C Cordell, Randi E Foraker, Albert M Lai, Stephanie M Morris, David H Gutmann, Philip R O Payne, Aditi Gupta
{"title":"Comparison of rule- and large language model-based phenotype extraction from clinical notes for neurofibromatosis type 1.","authors":"Levi Kaster, Ethan Hillis, Inez Y Oh, Elizabeth C Cordell, Randi E Foraker, Albert M Lai, Stephanie M Morris, David H Gutmann, Philip R O Payne, Aditi Gupta","doi":"10.1093/jamia/ocaf155","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Neurofibromatosis type 1 (NF1) is a rare genetic disorder affecting multiple organ systems with significant clinical heterogeneity. Managing individuals with NF1 is challenging due to variability in disease progression and outcomes and limited early risk assessment tools.</p><p><strong>Objective: </strong>This study aims to develop an effective, generalizable, user-friendly clinical entity extraction pipeline for identifying NF1-related phenotypes from unstructured clinical notes to enhance research and risk-modeling efforts. We compare the benefits of rule-based natural language processing (NLP) vs large language models (LLMs) for this purpose.</p><p><strong>Materials and methods: </strong>Four phenotype extraction pipelines (3 LLM-based vs 1 rule-based) were developed to automatically extract selected NF1-relevant phenotypes. Subject matter experts manually reviewed clinical notes, generating a gold-standard annotation dataset for evaluation. In Phase 1, notes authored by a single NF1 physician were used to guide pipeline development and refinement. In Phase 2, notes from a second NF1 physician were used to assess pipeline generalizability, followed by further refinement to accommodate differences in physician terminology.</p><p><strong>Results: </strong>With refinement, the rule-based model had higher distributions of F1 scores than the LLMs in both Phase 1 and Phase 2. However, the LLMs demonstrated better generalizability between physicians without refinement, showing lesser performance decreases (4.4%-5.1%) when transitioning from Phase 1 to Phase 2 without refinement, compared to an 8.8% decrease for the rule-based model.</p><p><strong>Conclusion: </strong>We highlight trade-offs between the effectiveness of rule-based NLP vs generalizability and ease of implementation of LLMs for clinical entity extraction, with implications for pipeline portability across providers and institutions.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocaf155","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Neurofibromatosis type 1 (NF1) is a rare genetic disorder affecting multiple organ systems with significant clinical heterogeneity. Managing individuals with NF1 is challenging due to variability in disease progression and outcomes and limited early risk assessment tools.
Objective: This study aims to develop an effective, generalizable, user-friendly clinical entity extraction pipeline for identifying NF1-related phenotypes from unstructured clinical notes to enhance research and risk-modeling efforts. We compare the benefits of rule-based natural language processing (NLP) vs large language models (LLMs) for this purpose.
Materials and methods: Four phenotype extraction pipelines (3 LLM-based vs 1 rule-based) were developed to automatically extract selected NF1-relevant phenotypes. Subject matter experts manually reviewed clinical notes, generating a gold-standard annotation dataset for evaluation. In Phase 1, notes authored by a single NF1 physician were used to guide pipeline development and refinement. In Phase 2, notes from a second NF1 physician were used to assess pipeline generalizability, followed by further refinement to accommodate differences in physician terminology.
Results: With refinement, the rule-based model had higher distributions of F1 scores than the LLMs in both Phase 1 and Phase 2. However, the LLMs demonstrated better generalizability between physicians without refinement, showing lesser performance decreases (4.4%-5.1%) when transitioning from Phase 1 to Phase 2 without refinement, compared to an 8.8% decrease for the rule-based model.
Conclusion: We highlight trade-offs between the effectiveness of rule-based NLP vs generalizability and ease of implementation of LLMs for clinical entity extraction, with implications for pipeline portability across providers and institutions.
期刊介绍:
JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.