Assessing large language model performance related to aging in genetic conditions.

IF 4.1 Q2 GERIATRICS & GERONTOLOGY
Amna A Othman, Kendall A Flaharty, Suzanna E Ledgister Hanchard, Ping Hu, Dat Duong, Rebekah L Waikel, Benjamin D Solomon
{"title":"Assessing large language model performance related to aging in genetic conditions.","authors":"Amna A Othman, Kendall A Flaharty, Suzanna E Ledgister Hanchard, Ping Hu, Dat Duong, Rebekah L Waikel, Benjamin D Solomon","doi":"10.1038/s41514-025-00226-z","DOIUrl":null,"url":null,"abstract":"<p><p>Most genetic conditions are described in pediatric populations, leaving a gap in understanding their clinical progression and management in adulthood. Motivated by other applications of large language models (LLMs), we evaluated whether Llama-2-70b-chat (70b) and GPT-3.5 (GPT) could generate plausible medical vignettes, patient-geneticist dialogues and management plans for a hypothetical child and adult patients across 282 genetic conditions (selected by prevalence and categorized based on age-related characteristics). Results showed that LLMs provided appropriate age-based responses in both child and adult outputs based on Correctness and Completeness scores graded by clinicians. Sub-analysis of metabolic conditions including those typically presents neonatally with crisis also showed age-appropriate LLM responses. However 70b and GPT obtained low Correctness and Completeness scores at producing plausible management plans (55-66% for 70b and a wider range, 50-90%, for GPT). This suggests that LLMs still have some limitations in clinical applications.</p>","PeriodicalId":94160,"journal":{"name":"npj aging","volume":"11 1","pages":"33"},"PeriodicalIF":4.1000,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12049513/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"npj aging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s41514-025-00226-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GERIATRICS & GERONTOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Most genetic conditions are described in pediatric populations, leaving a gap in understanding their clinical progression and management in adulthood. Motivated by other applications of large language models (LLMs), we evaluated whether Llama-2-70b-chat (70b) and GPT-3.5 (GPT) could generate plausible medical vignettes, patient-geneticist dialogues and management plans for a hypothetical child and adult patients across 282 genetic conditions (selected by prevalence and categorized based on age-related characteristics). Results showed that LLMs provided appropriate age-based responses in both child and adult outputs based on Correctness and Completeness scores graded by clinicians. Sub-analysis of metabolic conditions including those typically presents neonatally with crisis also showed age-appropriate LLM responses. However 70b and GPT obtained low Correctness and Completeness scores at producing plausible management plans (55-66% for 70b and a wider range, 50-90%, for GPT). This suggests that LLMs still have some limitations in clinical applications.

评估遗传条件下与衰老相关的大型语言模型性能。
大多数遗传病是在儿科人群中描述的,在了解其临床进展和成年后的管理方面存在差距。受大型语言模型(LLMs)的其他应用的激励,我们评估了lama-2-70b-chat (70b)和GPT-3.5 (GPT)是否可以为282种遗传疾病(按患病率选择并根据年龄相关特征分类)的假设儿童和成人患者生成可信的医学插图、患者-遗传学家对话和管理计划。结果显示,根据临床医生评分的正确性和完整性评分,llm在儿童和成人输出中都提供了适当的基于年龄的反应。代谢状况的亚分析,包括那些典型的新生儿危象,也显示出与年龄相适应的LLM反应。然而,70b和GPT在制定合理的管理计划方面获得了较低的正确性和完整性得分(70b为55-66%,而GPT的范围更广,为50-90%)。这表明llm在临床应用中仍有一定的局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.90
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信