A large-scale prospective nested case-control study: developing a comprehensive risk prediction model for early detection of pancreatic cancer in the community-based ESPRIT-AI cohort

IF 7.6 1区医学 Q1 HEALTH CARE SCIENCES & SERVICES

The Lancet Regional Health: Western Pacific Pub Date : 2025-02-01 DOI:10.1016/j.lanwpc.2024.101310

Chaoliang Zhong , Penghao Li , Jia Zhao , Xue Han , Beilei Wang , Gang Jin

{"title":"A large-scale prospective nested case-control study: developing a comprehensive risk prediction model for early detection of pancreatic cancer in the community-based ESPRIT-AI cohort","authors":"Chaoliang Zhong , Penghao Li , Jia Zhao , Xue Han , Beilei Wang , Gang Jin","doi":"10.1016/j.lanwpc.2024.101310","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Pancreatic cancer (PC) remains a significant public health concern due to its late diagnosis and limited effective screening methods. This study aimed to develop a robust risk prediction model for early detection, utilizing a large prospective cohort to ensure generalizability.</div></div><div><h3>Method</h3><div>We established a large-scale, continuous, real-world cohort, termed the Artificial Intelligence-based Early Screening of Pancreatic Cancer and High-Risk Tracing (ESPRIT-AI). This cohort encompasses 12 community health centers in Yangpu District, Shanghai, China. Based on this comprehensive dataset, we conducted a prospective, nested case-control study. Nine centers served as the training cohort, while three centers served as the test cohort. A total of 51,490 participants aged 50-75 years underwent annual health examinations from 2021.1 to 2023.12. The risk-related information and informed consent were collected from all the participants. PC diagnosis was obtained from the Center for Disease Control and Prevention's cancer registry. Model training utilized a 1:20 case-control ratio, employing LASSO regression and expert opinion to select features. Multiple machine learning algorithms were compared, with the best performing algorithm selected for the final predictive model, subsequently validated using a real-world external test cohort. The study was registered with <span><span>ClinicalTrials.gov</span><svg><path></path></svg></span> (NCT04743479).</div></div><div><h3>Findings</h3><div>The cohort was divided into training (n=39,929, including 45 cases and 900 nested controls) and test (n=11,561, including 15 cases and 11,546 controls) sets. Following variable selection, four optimal variables were identified: Body Mass Index (BMI), Fasting Blood Glucose (FBG), Symptom, and Age. Multiple machine learning algorithms were evaluated, with the Random Forest demonstrating superior performance and selected as the final model. In a large-scale, independent real-world test cohort, the model demonstrated a specificity of 97.21% and sensitivity of 33.33%. The model effectively stratified the population, identifying 316 high-risk individuals (2.73% of the test set), among whom 5 were diagnosed with PC. This resulted in a PC prevalence of 1.58% within the high-risk group, representing a 1.93-fold increase compared to the 0.82% prevalence in newly diagnosed diabetes.</div></div><div><h3>Interpretation</h3><div>These findings demonstrated our established model’s capacity to effectively identify a subpopulation with significantly elevated PC risk, potentially facilitating targeted imaging-based early detection strategies, balancing screening benefits and burdens.</div></div><div><h3>Funding</h3><div>This work was funded by the <span>Shanghai Science and Technology Committee</span> Program (grant number 20511101200).</div></div>","PeriodicalId":22792,"journal":{"name":"The Lancet Regional Health: Western Pacific","volume":"55 ","pages":"Article 101310"},"PeriodicalIF":7.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Lancet Regional Health: Western Pacific","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666606524003043","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Pancreatic cancer (PC) remains a significant public health concern due to its late diagnosis and limited effective screening methods. This study aimed to develop a robust risk prediction model for early detection, utilizing a large prospective cohort to ensure generalizability.

Method

We established a large-scale, continuous, real-world cohort, termed the Artificial Intelligence-based Early Screening of Pancreatic Cancer and High-Risk Tracing (ESPRIT-AI). This cohort encompasses 12 community health centers in Yangpu District, Shanghai, China. Based on this comprehensive dataset, we conducted a prospective, nested case-control study. Nine centers served as the training cohort, while three centers served as the test cohort. A total of 51,490 participants aged 50-75 years underwent annual health examinations from 2021.1 to 2023.12. The risk-related information and informed consent were collected from all the participants. PC diagnosis was obtained from the Center for Disease Control and Prevention's cancer registry. Model training utilized a 1:20 case-control ratio, employing LASSO regression and expert opinion to select features. Multiple machine learning algorithms were compared, with the best performing algorithm selected for the final predictive model, subsequently validated using a real-world external test cohort. The study was registered with ClinicalTrials.gov (NCT04743479).

Findings

The cohort was divided into training (n=39,929, including 45 cases and 900 nested controls) and test (n=11,561, including 15 cases and 11,546 controls) sets. Following variable selection, four optimal variables were identified: Body Mass Index (BMI), Fasting Blood Glucose (FBG), Symptom, and Age. Multiple machine learning algorithms were evaluated, with the Random Forest demonstrating superior performance and selected as the final model. In a large-scale, independent real-world test cohort, the model demonstrated a specificity of 97.21% and sensitivity of 33.33%. The model effectively stratified the population, identifying 316 high-risk individuals (2.73% of the test set), among whom 5 were diagnosed with PC. This resulted in a PC prevalence of 1.58% within the high-risk group, representing a 1.93-fold increase compared to the 0.82% prevalence in newly diagnosed diabetes.

Interpretation

These findings demonstrated our established model’s capacity to effectively identify a subpopulation with significantly elevated PC risk, potentially facilitating targeted imaging-based early detection strategies, balancing screening benefits and burdens.

Funding

This work was funded by the Shanghai Science and Technology Committee Program (grant number 20511101200).

查看原文本刊更多论文

一项大规模前瞻性巢式病例对照研究：在社区ESPRIT-AI队列中建立早期发现胰腺癌的综合风险预测模型

胰腺癌（PC）由于其诊断较晚和有效筛查方法有限，仍然是一个重要的公共卫生问题。本研究旨在建立一个强大的早期发现风险预测模型，利用一个大的前瞻性队列来确保推广。方法：我们建立了一个大规模的、连续的、真实世界的队列，称为基于人工智能的胰腺癌早期筛查和高风险追踪（ESPRIT-AI）。该队列包括中国上海杨浦区的12个社区卫生中心。基于这一综合数据集，我们进行了一项前瞻性巢式病例对照研究。9个中心作为培训队列，3个中心作为测试队列。从2021.1年到2023.12年，共有51490名年龄在50-75岁之间的参与者接受了年度健康检查。收集所有参与者的风险相关信息和知情同意。PC诊断从疾病控制和预防中心的癌症登记处获得。模型训练采用1:20的病例-对照比，采用LASSO回归和专家意见选择特征。对多种机器学习算法进行比较，选择表现最佳的算法作为最终预测模型，随后使用真实世界的外部测试队列进行验证。该研究已在ClinicalTrials.gov注册（NCT04743479）。该队列分为训练组（n=39,929，包括45例病例和900个嵌套对照）和测试组（n=11,561，包括15例病例和11,546个对照）。根据变量选择，确定了四个最佳变量：身体质量指数（BMI）、空腹血糖（FBG）、症状和年龄。对多种机器学习算法进行了评估，随机森林表现出优异的性能，并被选为最终模型。在大规模、独立的真实世界测试队列中，该模型的特异性为97.21%，敏感性为33.33%。该模型有效地对人群进行了分层，识别出316名高危个体（占测试集的2.73%），其中5人被诊断为PC。这导致高危人群中PC患病率为1.58%，与新诊断的糖尿病患病率0.82%相比增加了1.93倍。这些发现证明了我们建立的模型能够有效地识别PC风险显著升高的亚群，潜在地促进有针对性的基于成像的早期检测策略，平衡筛查的好处和负担。本工作由上海市科委计划资助（批准号：20511101200）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The Lancet Regional Health: Western Pacific Medicine-Pediatrics, Perinatology and Child Health

CiteScore

8.80

自引率

2.80%

发文量

305

审稿时长

11 weeks

期刊介绍： The Lancet Regional Health – Western Pacific, a gold open access journal, is an integral part of The Lancet's global initiative advocating for healthcare quality and access worldwide. It aims to advance clinical practice and health policy in the Western Pacific region, contributing to enhanced health outcomes. The journal publishes high-quality original research shedding light on clinical practice and health policy in the region. It also includes reviews, commentaries, and opinion pieces covering diverse regional health topics, such as infectious diseases, non-communicable diseases, child and adolescent health, maternal and reproductive health, aging health, mental health, the health workforce and systems, and health policy.