大规模流行病学方法回顾:使用LLM评估Charlson合并症版本。

Joshua T Fuchs, Cara Johnson, Nathan Foster, Peter J Leese
{"title":"大规模流行病学方法回顾:使用LLM评估Charlson合并症版本。","authors":"Joshua T Fuchs, Cara Johnson, Nathan Foster, Peter J Leese","doi":"10.1101/2025.09.23.25336010","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The Charlson Comorbidity Index (CCI) is widely used in epidemiologic studies. However, many versions of the CCI have been developed since the original method was published in 1987, and it is unclear which version is used most frequently and how version utilization in research has changed over time.</p><p><strong>Objective: </strong>We present an approach using a Large Language Model (LLM) to extract data from articles by detecting which specific CCI version is employed.</p><p><strong>Design: </strong>We designed a series of prompts that carefully guided the LLM through the identification and extraction of references used in the calculation of the CCI for each particular article. We used the Llama 3.3 70B Instruct model to identify and extract which references were used in the calculation of the CCI.</p><p><strong>Setting: </strong>We analyzed 31,767 articles published since 2012 to evaluate the landscape of CCI implementation. The articles were sourced from the PubMed Central Open Access subset.</p><p><strong>Measurements: </strong>For each article, we measured which version of the CCI was used, if any.</p><p><strong>Results: </strong>We show that 63% of articles that cite only a single method version cite only the original CCI publication, which cannot be applied in the modern real-world data era, leading to ambiguity about how the CCI is being calculated.</p><p><strong>Limitations: </strong>For articles that did not reference one of the CCI versions we searched for, we were unable to determine whether the paper used a different version, created a specific implementation for that paper, or is ambiguous about how the CCI was calculated.</p><p><strong>Conclusion: </strong>This paper introduces a generalizable approach to scaling methods literature review beyond what is typically possible by human-review, which we then validate and demonstrate the value of through application to the CCI.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12485990/pdf/","citationCount":"0","resultStr":"{\"title\":\"Epidemiologic Method Review at Scale: Assessing Charlson Comorbidity Versioning Using a LLM.\",\"authors\":\"Joshua T Fuchs, Cara Johnson, Nathan Foster, Peter J Leese\",\"doi\":\"10.1101/2025.09.23.25336010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The Charlson Comorbidity Index (CCI) is widely used in epidemiologic studies. However, many versions of the CCI have been developed since the original method was published in 1987, and it is unclear which version is used most frequently and how version utilization in research has changed over time.</p><p><strong>Objective: </strong>We present an approach using a Large Language Model (LLM) to extract data from articles by detecting which specific CCI version is employed.</p><p><strong>Design: </strong>We designed a series of prompts that carefully guided the LLM through the identification and extraction of references used in the calculation of the CCI for each particular article. We used the Llama 3.3 70B Instruct model to identify and extract which references were used in the calculation of the CCI.</p><p><strong>Setting: </strong>We analyzed 31,767 articles published since 2012 to evaluate the landscape of CCI implementation. The articles were sourced from the PubMed Central Open Access subset.</p><p><strong>Measurements: </strong>For each article, we measured which version of the CCI was used, if any.</p><p><strong>Results: </strong>We show that 63% of articles that cite only a single method version cite only the original CCI publication, which cannot be applied in the modern real-world data era, leading to ambiguity about how the CCI is being calculated.</p><p><strong>Limitations: </strong>For articles that did not reference one of the CCI versions we searched for, we were unable to determine whether the paper used a different version, created a specific implementation for that paper, or is ambiguous about how the CCI was calculated.</p><p><strong>Conclusion: </strong>This paper introduces a generalizable approach to scaling methods literature review beyond what is typically possible by human-review, which we then validate and demonstrate the value of through application to the CCI.</p>\",\"PeriodicalId\":94281,\"journal\":{\"name\":\"medRxiv : the preprint server for health sciences\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12485990/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"medRxiv : the preprint server for health sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2025.09.23.25336010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv : the preprint server for health sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2025.09.23.25336010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景:Charlson共病指数(CCI)在流行病学研究中被广泛使用。然而,自1987年最初的CCI方法发表以来,已经开发了许多版本的CCI,目前尚不清楚哪个版本最常被使用,以及研究中版本的使用如何随着时间的推移而变化。目的:我们提出了一种使用大型语言模型(LLM)的方法,通过检测使用哪个特定的CCI版本来从文章中提取数据。设计:我们设计了一系列提示,仔细指导LLM识别和提取用于计算每篇特定文章的CCI的参考文献。我们使用Llama 3.3 70B指令模型来识别和提取CCI计算中使用的参考文献。背景:我们分析了自2012年以来发表的31767篇文章,以评估CCI实施的前景。这些文章来自PubMed Central Open Access子集。测量:对于每篇文章,我们测量了使用的CCI版本(如果有的话)。结果:我们发现,63%只引用单一方法版本的文章只引用原始CCI出版物,这不能应用于现代现实世界的数据时代,导致CCI的计算方式不明确。限制:对于没有引用我们搜索的CCI版本之一的文章,我们无法确定该论文是否使用了不同的版本,为该论文创建了特定的实现,或者对CCI的计算方式不明确。结论:本文介绍了一种可推广的方法来扩展方法文献综述,而不是通常可能的人工综述,然后我们通过应用于CCI来验证和展示其价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Epidemiologic Method Review at Scale: Assessing Charlson Comorbidity Versioning Using a LLM.

Epidemiologic Method Review at Scale: Assessing Charlson Comorbidity Versioning Using a LLM.

Epidemiologic Method Review at Scale: Assessing Charlson Comorbidity Versioning Using a LLM.

Epidemiologic Method Review at Scale: Assessing Charlson Comorbidity Versioning Using a LLM.

Background: The Charlson Comorbidity Index (CCI) is widely used in epidemiologic studies. However, many versions of the CCI have been developed since the original method was published in 1987, and it is unclear which version is used most frequently and how version utilization in research has changed over time.

Objective: We present an approach using a Large Language Model (LLM) to extract data from articles by detecting which specific CCI version is employed.

Design: We designed a series of prompts that carefully guided the LLM through the identification and extraction of references used in the calculation of the CCI for each particular article. We used the Llama 3.3 70B Instruct model to identify and extract which references were used in the calculation of the CCI.

Setting: We analyzed 31,767 articles published since 2012 to evaluate the landscape of CCI implementation. The articles were sourced from the PubMed Central Open Access subset.

Measurements: For each article, we measured which version of the CCI was used, if any.

Results: We show that 63% of articles that cite only a single method version cite only the original CCI publication, which cannot be applied in the modern real-world data era, leading to ambiguity about how the CCI is being calculated.

Limitations: For articles that did not reference one of the CCI versions we searched for, we were unable to determine whether the paper used a different version, created a specific implementation for that paper, or is ambiguous about how the CCI was calculated.

Conclusion: This paper introduces a generalizable approach to scaling methods literature review beyond what is typically possible by human-review, which we then validate and demonstrate the value of through application to the CCI.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信