Accelerating Medical Record Data Abstraction and Analysis in Muscular Dystrophy: Large Language Models and International Classification of Diseases Codes.

IF 3.2 Q3 CLINICAL NEUROLOGY

Neurology. Clinical practice Pub Date : 2025-12-01 Epub Date: 2025-09-23 DOI:10.1212/CPJ.0000000000200542

Huixue Zhou, Geetanjali Rajamani, Jiatan Huang, Magali Jorand-Fletcher, Yara Mohamed, Kody A DeGolier, Annette Xenopoulos-Oddsson, Erjia Cui, Carla D Zingariello, Rui Zhang, Peter B Kang

{"title":"Accelerating Medical Record Data Abstraction and Analysis in Muscular Dystrophy: Large Language Models and International Classification of Diseases Codes.","authors":"Huixue Zhou, Geetanjali Rajamani, Jiatan Huang, Magali Jorand-Fletcher, Yara Mohamed, Kody A DeGolier, Annette Xenopoulos-Oddsson, Erjia Cui, Carla D Zingariello, Rui Zhang, Peter B Kang","doi":"10.1212/CPJ.0000000000200542","DOIUrl":null,"url":null,"abstract":"Background and objectives: Muscular dystrophies are characterized by progressive muscle weakness and degeneration. Identifying cases and abstracting data from electronic medical records (EMRs) is helpful for surveillance and research. However, manual EMR abstraction is laborious. We studied 2 approaches to accelerate EMR abstraction: large language models (LLMs) and International Classification of Diseases (ICD) code meta-analysis.Methods: In our cross-sectional study, EMRs from 22 individuals with Duchenne muscular dystrophy (DMD) and 22 individuals with limb-girdle muscular dystrophy (LGMD) were exported into a data shelter and manually annotated using MedTator. Annotations were guided by a schema focused on 4 key features of muscular dystrophy: first symptoms, ambulatory status, serum creatine kinase (CK) levels, and genetic test results. Five LLMs were fed a series of prompts and examples, and then, clinic notes from each of the 44 cases were inputted for model analysis. Inter-rater agreement (IAA) and F1 scores were calculated for manual annotations, and the F1 score for LLMs compared with manual annotations was calculated. We then analyzed a separate set of 77 DMD and 59 LGMD cases to determine whether the number of health care encounters with a muscular dystrophy-related ICD code could predict diagnostic certainty based on MD STARnet criteria.Results: IAA for manual annotations varied between 80% (for annotation of symptoms) and 100% (for CK values). The highest performing LLM was Llama 3-8b, which yielded the following accuracies: 46.8% for \"first symptoms,\" 56.9% for \"ambulatory status,\" 69.2% for \"CK values,\" and 68.4% for \"genetic test results.\" Among 77 individuals with DMD, all patients with 20 or more encounters linked to relevant ICD codes had definite or probable diagnoses, whereas among 59 individuals with LGMD, all patients with 25 or more encounters linked to relevant ICD codes had definite or probable diagnoses.Discussion: LLMs promise to accelerate EMR abstraction for rare diseases such as muscular dystrophy, but F1 scores for LLMs currently lag manual abstractions for unstructured data. Llama 3-8b demonstrated superior performance to the 4 other models tested. Metadata such as ICD code counts may help prioritize high-yield cases for surveillance and research purposes.","PeriodicalId":19136,"journal":{"name":"Neurology. Clinical practice","volume":"15 6","pages":"e200542"},"PeriodicalIF":3.2000,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12456306/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurology. Clinical practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1212/CPJ.0000000000200542","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/23 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background and objectives: Muscular dystrophies are characterized by progressive muscle weakness and degeneration. Identifying cases and abstracting data from electronic medical records (EMRs) is helpful for surveillance and research. However, manual EMR abstraction is laborious. We studied 2 approaches to accelerate EMR abstraction: large language models (LLMs) and International Classification of Diseases (ICD) code meta-analysis.

Methods: In our cross-sectional study, EMRs from 22 individuals with Duchenne muscular dystrophy (DMD) and 22 individuals with limb-girdle muscular dystrophy (LGMD) were exported into a data shelter and manually annotated using MedTator. Annotations were guided by a schema focused on 4 key features of muscular dystrophy: first symptoms, ambulatory status, serum creatine kinase (CK) levels, and genetic test results. Five LLMs were fed a series of prompts and examples, and then, clinic notes from each of the 44 cases were inputted for model analysis. Inter-rater agreement (IAA) and F1 scores were calculated for manual annotations, and the F1 score for LLMs compared with manual annotations was calculated. We then analyzed a separate set of 77 DMD and 59 LGMD cases to determine whether the number of health care encounters with a muscular dystrophy-related ICD code could predict diagnostic certainty based on MD STARnet criteria.

Results: IAA for manual annotations varied between 80% (for annotation of symptoms) and 100% (for CK values). The highest performing LLM was Llama 3-8b, which yielded the following accuracies: 46.8% for "first symptoms," 56.9% for "ambulatory status," 69.2% for "CK values," and 68.4% for "genetic test results." Among 77 individuals with DMD, all patients with 20 or more encounters linked to relevant ICD codes had definite or probable diagnoses, whereas among 59 individuals with LGMD, all patients with 25 or more encounters linked to relevant ICD codes had definite or probable diagnoses.

Discussion: LLMs promise to accelerate EMR abstraction for rare diseases such as muscular dystrophy, but F1 scores for LLMs currently lag manual abstractions for unstructured data. Llama 3-8b demonstrated superior performance to the 4 other models tested. Metadata such as ICD code counts may help prioritize high-yield cases for surveillance and research purposes.

查看原文本刊更多论文

加速肌萎缩症病历数据的提取和分析：大语言模型和国际疾病代码分类。

背景和目的：肌肉萎缩症的特征是进行性肌肉无力和变性。从电子病历（EMRs）中识别病例和提取数据有助于监测和研究。然而，手工EMR抽象是费力的。我们研究了两种加速EMR抽象的方法：大型语言模型（LLMs）和国际疾病分类（ICD）代码元分析。方法：在我们的横断面研究中，将22例杜氏肌营养不良症（DMD）患者和22例四肢带状肌营养不良症（LGMD）患者的emr输出到数据库中，并使用MedTator手工注释。注释以肌萎缩症的4个关键特征为指导：首发症状、动态状态、血清肌酸激酶（CK）水平和基因检测结果。5位法学硕士被输入了一系列的提示和例子，然后输入44个病例中的每个病例的临床记录进行模型分析。计算手工标注的评分间一致性（Inter-rater agreement， IAA）和F1分数，计算llm与手工标注的F1分数。然后，我们分析了77例DMD和59例LGMD病例，以确定与肌肉萎缩症相关的ICD代码的医疗就诊次数是否可以预测基于MD STARnet标准的诊断确定性。结果：人工标注的IAA在80%（症状标注）和100% （CK值标注）之间变化。表现最好的LLM是羊驼3-8b，其准确度如下：“首发症状”46.8%，“活动状态”56.9%，“CK值”69.2%，“基因检测结果”68.4%。在77例DMD患者中，所有与相关ICD代码有20次或以上接触的患者都有明确或可能的诊断，而在59例LGMD患者中，所有与相关ICD代码有25次或以上接触的患者都有明确或可能的诊断。讨论：llm有望加速对罕见疾病（如肌肉萎缩症）的EMR抽象，但llm的F1分数目前落后于对非结构化数据的人工抽象。羊驼3-8b表现出比其他4种型号更好的性能。ICD代码计数等元数据可能有助于为监测和研究目的优先考虑高产量病例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurology. Clinical practice CLINICAL NEUROLOGY-

CiteScore

4.00

自引率

0.00%

发文量

期刊介绍： Neurology® Genetics is an online open access journal publishing peer-reviewed reports in the field of neurogenetics. The journal publishes original articles in all areas of neurogenetics including rare and common genetic variations, genotype-phenotype correlations, outlier phenotypes as a result of mutations in known disease genes, and genetic variations with a putative link to diseases. Articles include studies reporting on genetic disease risk, pharmacogenomics, and results of gene-based clinical trials (viral, ASO, etc.). Genetically engineered model systems are not a primary focus of Neurology® Genetics, but studies using model systems for treatment trials, including well-powered studies reporting negative results, are welcome.