Huixue Zhou, Geetanjali Rajamani, Jiatan Huang, Magali Jorand-Fletcher, Yara Mohamed, Kody A DeGolier, Annette Xenopoulos-Oddsson, Erjia Cui, Carla D Zingariello, Rui Zhang, Peter B Kang
{"title":"加速肌萎缩症病历数据的提取和分析:大语言模型和国际疾病代码分类。","authors":"Huixue Zhou, Geetanjali Rajamani, Jiatan Huang, Magali Jorand-Fletcher, Yara Mohamed, Kody A DeGolier, Annette Xenopoulos-Oddsson, Erjia Cui, Carla D Zingariello, Rui Zhang, Peter B Kang","doi":"10.1212/CPJ.0000000000200542","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and objectives: </strong>Muscular dystrophies are characterized by progressive muscle weakness and degeneration. Identifying cases and abstracting data from electronic medical records (EMRs) is helpful for surveillance and research. However, manual EMR abstraction is laborious. We studied 2 approaches to accelerate EMR abstraction: large language models (LLMs) and International Classification of Diseases (ICD) code meta-analysis.</p><p><strong>Methods: </strong>In our cross-sectional study, EMRs from 22 individuals with Duchenne muscular dystrophy (DMD) and 22 individuals with limb-girdle muscular dystrophy (LGMD) were exported into a data shelter and manually annotated using MedTator. Annotations were guided by a schema focused on 4 key features of muscular dystrophy: first symptoms, ambulatory status, serum creatine kinase (CK) levels, and genetic test results. Five LLMs were fed a series of prompts and examples, and then, clinic notes from each of the 44 cases were inputted for model analysis. Inter-rater agreement (IAA) and F1 scores were calculated for manual annotations, and the F1 score for LLMs compared with manual annotations was calculated. We then analyzed a separate set of 77 DMD and 59 LGMD cases to determine whether the number of health care encounters with a muscular dystrophy-related ICD code could predict diagnostic certainty based on MD STAR<i>net</i> criteria.</p><p><strong>Results: </strong>IAA for manual annotations varied between 80% (for annotation of symptoms) and 100% (for CK values). The highest performing LLM was Llama 3-8b, which yielded the following accuracies: 46.8% for \"first symptoms,\" 56.9% for \"ambulatory status,\" 69.2% for \"CK values,\" and 68.4% for \"genetic test results.\" Among 77 individuals with DMD, all patients with 20 or more encounters linked to relevant ICD codes had definite or probable diagnoses, whereas among 59 individuals with LGMD, all patients with 25 or more encounters linked to relevant ICD codes had definite or probable diagnoses.</p><p><strong>Discussion: </strong>LLMs promise to accelerate EMR abstraction for rare diseases such as muscular dystrophy, but F1 scores for LLMs currently lag manual abstractions for unstructured data. Llama 3-8b demonstrated superior performance to the 4 other models tested. Metadata such as ICD code counts may help prioritize high-yield cases for surveillance and research purposes.</p>","PeriodicalId":19136,"journal":{"name":"Neurology. Clinical practice","volume":"15 6","pages":"e200542"},"PeriodicalIF":3.2000,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12456306/pdf/","citationCount":"0","resultStr":"{\"title\":\"Accelerating Medical Record Data Abstraction and Analysis in Muscular Dystrophy: Large Language Models and International Classification of Diseases Codes.\",\"authors\":\"Huixue Zhou, Geetanjali Rajamani, Jiatan Huang, Magali Jorand-Fletcher, Yara Mohamed, Kody A DeGolier, Annette Xenopoulos-Oddsson, Erjia Cui, Carla D Zingariello, Rui Zhang, Peter B Kang\",\"doi\":\"10.1212/CPJ.0000000000200542\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background and objectives: </strong>Muscular dystrophies are characterized by progressive muscle weakness and degeneration. Identifying cases and abstracting data from electronic medical records (EMRs) is helpful for surveillance and research. However, manual EMR abstraction is laborious. We studied 2 approaches to accelerate EMR abstraction: large language models (LLMs) and International Classification of Diseases (ICD) code meta-analysis.</p><p><strong>Methods: </strong>In our cross-sectional study, EMRs from 22 individuals with Duchenne muscular dystrophy (DMD) and 22 individuals with limb-girdle muscular dystrophy (LGMD) were exported into a data shelter and manually annotated using MedTator. Annotations were guided by a schema focused on 4 key features of muscular dystrophy: first symptoms, ambulatory status, serum creatine kinase (CK) levels, and genetic test results. Five LLMs were fed a series of prompts and examples, and then, clinic notes from each of the 44 cases were inputted for model analysis. Inter-rater agreement (IAA) and F1 scores were calculated for manual annotations, and the F1 score for LLMs compared with manual annotations was calculated. We then analyzed a separate set of 77 DMD and 59 LGMD cases to determine whether the number of health care encounters with a muscular dystrophy-related ICD code could predict diagnostic certainty based on MD STAR<i>net</i> criteria.</p><p><strong>Results: </strong>IAA for manual annotations varied between 80% (for annotation of symptoms) and 100% (for CK values). The highest performing LLM was Llama 3-8b, which yielded the following accuracies: 46.8% for \\\"first symptoms,\\\" 56.9% for \\\"ambulatory status,\\\" 69.2% for \\\"CK values,\\\" and 68.4% for \\\"genetic test results.\\\" Among 77 individuals with DMD, all patients with 20 or more encounters linked to relevant ICD codes had definite or probable diagnoses, whereas among 59 individuals with LGMD, all patients with 25 or more encounters linked to relevant ICD codes had definite or probable diagnoses.</p><p><strong>Discussion: </strong>LLMs promise to accelerate EMR abstraction for rare diseases such as muscular dystrophy, but F1 scores for LLMs currently lag manual abstractions for unstructured data. Llama 3-8b demonstrated superior performance to the 4 other models tested. Metadata such as ICD code counts may help prioritize high-yield cases for surveillance and research purposes.</p>\",\"PeriodicalId\":19136,\"journal\":{\"name\":\"Neurology. Clinical practice\",\"volume\":\"15 6\",\"pages\":\"e200542\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12456306/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurology. Clinical practice\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1212/CPJ.0000000000200542\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/9/23 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurology. Clinical practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1212/CPJ.0000000000200542","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/23 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
Accelerating Medical Record Data Abstraction and Analysis in Muscular Dystrophy: Large Language Models and International Classification of Diseases Codes.
Background and objectives: Muscular dystrophies are characterized by progressive muscle weakness and degeneration. Identifying cases and abstracting data from electronic medical records (EMRs) is helpful for surveillance and research. However, manual EMR abstraction is laborious. We studied 2 approaches to accelerate EMR abstraction: large language models (LLMs) and International Classification of Diseases (ICD) code meta-analysis.
Methods: In our cross-sectional study, EMRs from 22 individuals with Duchenne muscular dystrophy (DMD) and 22 individuals with limb-girdle muscular dystrophy (LGMD) were exported into a data shelter and manually annotated using MedTator. Annotations were guided by a schema focused on 4 key features of muscular dystrophy: first symptoms, ambulatory status, serum creatine kinase (CK) levels, and genetic test results. Five LLMs were fed a series of prompts and examples, and then, clinic notes from each of the 44 cases were inputted for model analysis. Inter-rater agreement (IAA) and F1 scores were calculated for manual annotations, and the F1 score for LLMs compared with manual annotations was calculated. We then analyzed a separate set of 77 DMD and 59 LGMD cases to determine whether the number of health care encounters with a muscular dystrophy-related ICD code could predict diagnostic certainty based on MD STARnet criteria.
Results: IAA for manual annotations varied between 80% (for annotation of symptoms) and 100% (for CK values). The highest performing LLM was Llama 3-8b, which yielded the following accuracies: 46.8% for "first symptoms," 56.9% for "ambulatory status," 69.2% for "CK values," and 68.4% for "genetic test results." Among 77 individuals with DMD, all patients with 20 or more encounters linked to relevant ICD codes had definite or probable diagnoses, whereas among 59 individuals with LGMD, all patients with 25 or more encounters linked to relevant ICD codes had definite or probable diagnoses.
Discussion: LLMs promise to accelerate EMR abstraction for rare diseases such as muscular dystrophy, but F1 scores for LLMs currently lag manual abstractions for unstructured data. Llama 3-8b demonstrated superior performance to the 4 other models tested. Metadata such as ICD code counts may help prioritize high-yield cases for surveillance and research purposes.
期刊介绍:
Neurology® Genetics is an online open access journal publishing peer-reviewed reports in the field of neurogenetics. The journal publishes original articles in all areas of neurogenetics including rare and common genetic variations, genotype-phenotype correlations, outlier phenotypes as a result of mutations in known disease genes, and genetic variations with a putative link to diseases. Articles include studies reporting on genetic disease risk, pharmacogenomics, and results of gene-based clinical trials (viral, ASO, etc.). Genetically engineered model systems are not a primary focus of Neurology® Genetics, but studies using model systems for treatment trials, including well-powered studies reporting negative results, are welcome.