Automated extraction of functional biomarkers of verbal and ambulatory ability from multi-institutional clinical notes using large language models.

IF 4 2区医学 Q1 CLINICAL NEUROLOGY

Journal of Neurodevelopmental Disorders Pub Date : 2025-04-30 DOI:10.1186/s11689-025-09612-w

Levi Kaster, Ethan Hillis, Inez Y Oh, Bhooma R Aravamuthan, Virginia C Lanzotti, Casey R Vickstrom, Christina A Gurnett, Philip R O Payne, Aditi Gupta

{"title":"Automated extraction of functional biomarkers of verbal and ambulatory ability from multi-institutional clinical notes using large language models.","authors":"Levi Kaster, Ethan Hillis, Inez Y Oh, Bhooma R Aravamuthan, Virginia C Lanzotti, Casey R Vickstrom, Christina A Gurnett, Philip R O Payne, Aditi Gupta","doi":"10.1186/s11689-025-09612-w","DOIUrl":null,"url":null,"abstract":"Background: Functional biomarkers in neurodevelopmental disorders, such as verbal and ambulatory abilities, are essential for clinical care and research activities. Treatment planning, intervention monitoring, and identifying comorbid conditions in individuals with intellectual and developmental disabilities (IDDs) rely on standardized assessments of these abilities. However, traditional assessments impose a burden on patients and providers, often leading to longitudinal inconsistencies and inequities due to evolving guidelines and associated time-cost. Therefore, this study aimed to develop an automated approach to classify verbal and ambulatory abilities from EHR data of IDD and cerebral palsy (CP) patients. Application of large language models (LLMs) to clinical notes, which are rich in longitudinal data, may provide a low-burden pipeline for extracting functional biomarkers efficiently and accurately.Methods: Data from the multi-institutional National Brain Gene Registry (BGR) and a CP clinic cohort were utilized, comprising 3,245 notes from 125 individuals and 5,462 clinical notes from 260 individuals, respectively. Employing three LLMs-GPT-3.5 Turbo, GPT-4 Turbo, and GPT-4 Omni-we provided the models with a clinical note and utilized a detailed conversational format to prompt the models to answer: \"Does the individual use any words?\" and \"Can the individual walk without aid?\" These responses were evaluated against ground-truth abilities, which were established using neurobehavioral assessments collected for each dataset.Results: LLM pipelines demonstrated high accuracy (weighted-F1 scores > .90) in predicting ambulatory ability for both cohorts, likely due to the consistent use of Gross Motor Functional Classification System (GMFCS) as a consistent ground-truth standard. However, verbal ability predictions were more accurate in the BGR cohort, likely due to higher adherence between the prompt and ground-truth assessment questions. While LLMs can be computationally expensive, analysis of our protocol affirmed the cost effectiveness when applied to select notes from the EHR.Conclusions: LLMs are effective at extracting functional biomarkers from EHR data and broadly generalizable across variable note-taking practices and institutions. Individual verbal and ambulatory ability were accurately extracted, supporting the method's ability to streamline workflows by offering automated, efficient data extraction for patient care and research. Future studies are needed to extend this methodology to additional populations and to demonstrate more granular functional data classification.","PeriodicalId":16530,"journal":{"name":"Journal of Neurodevelopmental Disorders","volume":"17 1","pages":"24"},"PeriodicalIF":4.0000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12042395/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Neurodevelopmental Disorders","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s11689-025-09612-w","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Functional biomarkers in neurodevelopmental disorders, such as verbal and ambulatory abilities, are essential for clinical care and research activities. Treatment planning, intervention monitoring, and identifying comorbid conditions in individuals with intellectual and developmental disabilities (IDDs) rely on standardized assessments of these abilities. However, traditional assessments impose a burden on patients and providers, often leading to longitudinal inconsistencies and inequities due to evolving guidelines and associated time-cost. Therefore, this study aimed to develop an automated approach to classify verbal and ambulatory abilities from EHR data of IDD and cerebral palsy (CP) patients. Application of large language models (LLMs) to clinical notes, which are rich in longitudinal data, may provide a low-burden pipeline for extracting functional biomarkers efficiently and accurately.

Methods: Data from the multi-institutional National Brain Gene Registry (BGR) and a CP clinic cohort were utilized, comprising 3,245 notes from 125 individuals and 5,462 clinical notes from 260 individuals, respectively. Employing three LLMs-GPT-3.5 Turbo, GPT-4 Turbo, and GPT-4 Omni-we provided the models with a clinical note and utilized a detailed conversational format to prompt the models to answer: "Does the individual use any words?" and "Can the individual walk without aid?" These responses were evaluated against ground-truth abilities, which were established using neurobehavioral assessments collected for each dataset.

Results: LLM pipelines demonstrated high accuracy (weighted-F1 scores > .90) in predicting ambulatory ability for both cohorts, likely due to the consistent use of Gross Motor Functional Classification System (GMFCS) as a consistent ground-truth standard. However, verbal ability predictions were more accurate in the BGR cohort, likely due to higher adherence between the prompt and ground-truth assessment questions. While LLMs can be computationally expensive, analysis of our protocol affirmed the cost effectiveness when applied to select notes from the EHR.

Conclusions: LLMs are effective at extracting functional biomarkers from EHR data and broadly generalizable across variable note-taking practices and institutions. Individual verbal and ambulatory ability were accurately extracted, supporting the method's ability to streamline workflows by offering automated, efficient data extraction for patient care and research. Future studies are needed to extend this methodology to additional populations and to demonstrate more granular functional data classification.

Abstract Image

查看原文本刊更多论文

使用大型语言模型从多机构临床记录中自动提取语言和移动能力的功能性生物标志物。

背景：神经发育障碍的功能性生物标志物，如语言和行走能力，对临床护理和研究活动至关重要。智力和发育障碍（IDDs）患者的治疗计划、干预监测和确定合并症依赖于对这些能力的标准化评估。然而，传统的评估给患者和提供者带来了负担，由于不断发展的指导方针和相关的时间成本，往往导致纵向不一致和不公平。因此，本研究旨在开发一种自动化方法，从IDD和脑瘫（CP）患者的电子病历数据中分类语言和移动能力。大型语言模型（large language models, LLMs）应用于临床笔记，可以为高效、准确地提取功能性生物标志物提供低负担的途径。方法：利用来自多机构国家脑基因登记处（BGR）和CP临床队列的数据，分别包括125个人的3245条记录和260个人的5462条临床记录。使用三个LLMs-GPT-3.5 Turbo， GPT-4 Turbo和GPT-4 omnim -我们为模型提供临床说明，并使用详细的对话格式提示模型回答：“个人使用任何语言吗？”和“个人可以在没有帮助的情况下行走吗？”这些反应是根据对每个数据集收集的神经行为评估建立的基础真实能力进行评估的。结果：LLM管道在预测两个队列的运动能力方面表现出很高的准确性（加权f1得分>.90），可能是由于一致使用大肌肉运动功能分类系统（GMFCS）作为一致的基本事实标准。然而，语言能力预测在BGR队列中更准确，可能是由于提示和基本事实评估问题之间更高的依从性。虽然法学硕士在计算上可能很昂贵，但我们的协议分析证实了应用于从电子病历中选择笔记时的成本效益。结论：llm在从电子病历数据中提取功能性生物标志物方面是有效的，并且在不同的笔记记录实践和机构中广泛推广。准确地提取了个人语言和移动能力，通过为患者护理和研究提供自动化、高效的数据提取，支持该方法简化工作流程的能力。未来的研究需要将这种方法扩展到更多的人群，并证明更细粒度的功能数据分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Neurodevelopmental Disorders 医学-临床神经学

CiteScore

7.60

自引率

4.10%

发文量

审稿时长

>12 weeks

期刊介绍： Journal of Neurodevelopmental Disorders is an open access journal that integrates current, cutting-edge research across a number of disciplines, including neurobiology, genetics, cognitive neuroscience, psychiatry and psychology. The journal’s primary focus is on the pathogenesis of neurodevelopmental disorders including autism, fragile X syndrome, tuberous sclerosis, Turner Syndrome, 22q Deletion Syndrome, Prader-Willi and Angelman Syndrome, Williams syndrome, lysosomal storage diseases, dyslexia, specific language impairment and fetal alcohol syndrome. With the discovery of specific genes underlying neurodevelopmental syndromes, the emergence of powerful tools for studying neural circuitry, and the development of new approaches for exploring molecular mechanisms, interdisciplinary research on the pathogenesis of neurodevelopmental disorders is now increasingly common. Journal of Neurodevelopmental Disorders provides a unique venue for researchers interested in comparing and contrasting mechanisms and characteristics related to the pathogenesis of the full range of neurodevelopmental disorders, sharpening our understanding of the etiology and relevant phenotypes of each condition.