A large-scale prospective nested case-control study: developing a comprehensive risk prediction model for early detection of pancreatic cancer in the community-based ESPRIT-AI cohort

IF 7.6 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES
Chaoliang Zhong , Penghao Li , Jia Zhao , Xue Han , Beilei Wang , Gang Jin
{"title":"A large-scale prospective nested case-control study: developing a comprehensive risk prediction model for early detection of pancreatic cancer in the community-based ESPRIT-AI cohort","authors":"Chaoliang Zhong ,&nbsp;Penghao Li ,&nbsp;Jia Zhao ,&nbsp;Xue Han ,&nbsp;Beilei Wang ,&nbsp;Gang Jin","doi":"10.1016/j.lanwpc.2024.101310","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Pancreatic cancer (PC) remains a significant public health concern due to its late diagnosis and limited effective screening methods. This study aimed to develop a robust risk prediction model for early detection, utilizing a large prospective cohort to ensure generalizability.</div></div><div><h3>Method</h3><div>We established a large-scale, continuous, real-world cohort, termed the Artificial Intelligence-based Early Screening of Pancreatic Cancer and High-Risk Tracing (ESPRIT-AI). This cohort encompasses 12 community health centers in Yangpu District, Shanghai, China. Based on this comprehensive dataset, we conducted a prospective, nested case-control study. Nine centers served as the training cohort, while three centers served as the test cohort. A total of 51,490 participants aged 50-75 years underwent annual health examinations from 2021.1 to 2023.12. The risk-related information and informed consent were collected from all the participants. PC diagnosis was obtained from the Center for Disease Control and Prevention's cancer registry. Model training utilized a 1:20 case-control ratio, employing LASSO regression and expert opinion to select features. Multiple machine learning algorithms were compared, with the best performing algorithm selected for the final predictive model, subsequently validated using a real-world external test cohort. The study was registered with <span><span>ClinicalTrials.gov</span><svg><path></path></svg></span> (NCT04743479).</div></div><div><h3>Findings</h3><div>The cohort was divided into training (n=39,929, including 45 cases and 900 nested controls) and test (n=11,561, including 15 cases and 11,546 controls) sets. Following variable selection, four optimal variables were identified: Body Mass Index (BMI), Fasting Blood Glucose (FBG), Symptom, and Age. Multiple machine learning algorithms were evaluated, with the Random Forest demonstrating superior performance and selected as the final model. In a large-scale, independent real-world test cohort, the model demonstrated a specificity of 97.21% and sensitivity of 33.33%. The model effectively stratified the population, identifying 316 high-risk individuals (2.73% of the test set), among whom 5 were diagnosed with PC. This resulted in a PC prevalence of 1.58% within the high-risk group, representing a 1.93-fold increase compared to the 0.82% prevalence in newly diagnosed diabetes.</div></div><div><h3>Interpretation</h3><div>These findings demonstrated our established model’s capacity to effectively identify a subpopulation with significantly elevated PC risk, potentially facilitating targeted imaging-based early detection strategies, balancing screening benefits and burdens.</div></div><div><h3>Funding</h3><div>This work was funded by the <span>Shanghai Science and Technology Committee</span> Program (grant number 20511101200).</div></div>","PeriodicalId":22792,"journal":{"name":"The Lancet Regional Health: Western Pacific","volume":"55 ","pages":"Article 101310"},"PeriodicalIF":7.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Lancet Regional Health: Western Pacific","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666606524003043","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Pancreatic cancer (PC) remains a significant public health concern due to its late diagnosis and limited effective screening methods. This study aimed to develop a robust risk prediction model for early detection, utilizing a large prospective cohort to ensure generalizability.

Method

We established a large-scale, continuous, real-world cohort, termed the Artificial Intelligence-based Early Screening of Pancreatic Cancer and High-Risk Tracing (ESPRIT-AI). This cohort encompasses 12 community health centers in Yangpu District, Shanghai, China. Based on this comprehensive dataset, we conducted a prospective, nested case-control study. Nine centers served as the training cohort, while three centers served as the test cohort. A total of 51,490 participants aged 50-75 years underwent annual health examinations from 2021.1 to 2023.12. The risk-related information and informed consent were collected from all the participants. PC diagnosis was obtained from the Center for Disease Control and Prevention's cancer registry. Model training utilized a 1:20 case-control ratio, employing LASSO regression and expert opinion to select features. Multiple machine learning algorithms were compared, with the best performing algorithm selected for the final predictive model, subsequently validated using a real-world external test cohort. The study was registered with ClinicalTrials.gov (NCT04743479).

Findings

The cohort was divided into training (n=39,929, including 45 cases and 900 nested controls) and test (n=11,561, including 15 cases and 11,546 controls) sets. Following variable selection, four optimal variables were identified: Body Mass Index (BMI), Fasting Blood Glucose (FBG), Symptom, and Age. Multiple machine learning algorithms were evaluated, with the Random Forest demonstrating superior performance and selected as the final model. In a large-scale, independent real-world test cohort, the model demonstrated a specificity of 97.21% and sensitivity of 33.33%. The model effectively stratified the population, identifying 316 high-risk individuals (2.73% of the test set), among whom 5 were diagnosed with PC. This resulted in a PC prevalence of 1.58% within the high-risk group, representing a 1.93-fold increase compared to the 0.82% prevalence in newly diagnosed diabetes.

Interpretation

These findings demonstrated our established model’s capacity to effectively identify a subpopulation with significantly elevated PC risk, potentially facilitating targeted imaging-based early detection strategies, balancing screening benefits and burdens.

Funding

This work was funded by the Shanghai Science and Technology Committee Program (grant number 20511101200).
求助全文
约1分钟内获得全文 求助全文
来源期刊
The Lancet Regional Health: Western Pacific
The Lancet Regional Health: Western Pacific Medicine-Pediatrics, Perinatology and Child Health
CiteScore
8.80
自引率
2.80%
发文量
305
审稿时长
11 weeks
期刊介绍: The Lancet Regional Health – Western Pacific, a gold open access journal, is an integral part of The Lancet's global initiative advocating for healthcare quality and access worldwide. It aims to advance clinical practice and health policy in the Western Pacific region, contributing to enhanced health outcomes. The journal publishes high-quality original research shedding light on clinical practice and health policy in the region. It also includes reviews, commentaries, and opinion pieces covering diverse regional health topics, such as infectious diseases, non-communicable diseases, child and adolescent health, maternal and reproductive health, aging health, mental health, the health workforce and systems, and health policy.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信