Prediction models for high-grade cervical lesions or worse using machine learning.

IF 10 1区 医学 Q1 MEDICINE, GENERAL & INTERNAL
EClinicalMedicine Pub Date : 2026-03-05 eCollection Date: 2026-03-01 DOI:10.1016/j.eclinm.2026.103819
Yunyang Deng, Joakim Dillner, Nicholas Baltzer, Laila Sara Arroyo Mühr, Roxana Merino Martinez, Alexander Ploner, Jiayao Lei, Mark Clements
{"title":"Prediction models for high-grade cervical lesions or worse using machine learning.","authors":"Yunyang Deng, Joakim Dillner, Nicholas Baltzer, Laila Sara Arroyo Mühr, Roxana Merino Martinez, Alexander Ploner, Jiayao Lei, Mark Clements","doi":"10.1016/j.eclinm.2026.103819","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>This study aimed to improve cervical screening efficiency by developing and validating machine-learning models for predicting high-grade cervical lesions or worse (HCL) risk.</p><p><strong>Methods: </strong>From Swedish nationwide registers, we included 474,072 women invited to cervical screening in 2016 (split into 80% training and 20% test sets) and 370,105 women invited in 2017 for validation. All women underwent index cytology and/or human papillomavirus (HPV) testing within the recommended interval after age 29. Predictors included screening results (cytology and/or HPV testing), other HPV-related factors, and demographic factors (including age). Four random forest models were trained via 5-fold cross-validation with different predictors: Model 1 (M1) (all predictors), M2 (cytology, HPV testing, age), M3 (HPV testing, other HPV-related factors, and demographic factors), and M4 (HPV testing and age). We computed area under the curves (AUCs) and created plots to depict positive predictive value (PPV) by the number of women intervened.</p><p><strong>Findings: </strong>In training and test sets, 1-, 3-, and 5-year HCL incidence proportions were 0.25%, 0.68%, and 1.05%, respectively. Cross-validated AUCs were 0.83-0.96 (M1), 0.83-0.96 (M2), 0.91-0.94 (M3), and 0.91-0.93 (M4), depending on the prediction intervals. Similar AUCs were found in the test set. Additionally, the AUCs in the validation set were 0.85-0.95 (M1), 0.85-0.95 (M2), 0.91-0.94 (M3), and 0.92-0.93 (M4). Across all intervals, M1 consistently demonstrated the highest PPV, followed by M2, M3, and M4. For each model, PPVs were lowest for 1-year predictions but comparable at 3 and 5 years.</p><p><strong>Interpretation: </strong>The models demonstrated strong predictive performance. Evaluating PPVs over the number of invited women provides the potential for risk-stratified screening and clinical utility.</p><p><strong>Funding: </strong>Vetenskapsrådet, FORTE, Karolinska Institutet, Horizon 2020, and Cancerfonden.</p>","PeriodicalId":11393,"journal":{"name":"EClinicalMedicine","volume":"93 ","pages":"103819"},"PeriodicalIF":10.0000,"publicationDate":"2026-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12972733/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EClinicalMedicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.eclinm.2026.103819","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/3/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Background: This study aimed to improve cervical screening efficiency by developing and validating machine-learning models for predicting high-grade cervical lesions or worse (HCL) risk.

Methods: From Swedish nationwide registers, we included 474,072 women invited to cervical screening in 2016 (split into 80% training and 20% test sets) and 370,105 women invited in 2017 for validation. All women underwent index cytology and/or human papillomavirus (HPV) testing within the recommended interval after age 29. Predictors included screening results (cytology and/or HPV testing), other HPV-related factors, and demographic factors (including age). Four random forest models were trained via 5-fold cross-validation with different predictors: Model 1 (M1) (all predictors), M2 (cytology, HPV testing, age), M3 (HPV testing, other HPV-related factors, and demographic factors), and M4 (HPV testing and age). We computed area under the curves (AUCs) and created plots to depict positive predictive value (PPV) by the number of women intervened.

Findings: In training and test sets, 1-, 3-, and 5-year HCL incidence proportions were 0.25%, 0.68%, and 1.05%, respectively. Cross-validated AUCs were 0.83-0.96 (M1), 0.83-0.96 (M2), 0.91-0.94 (M3), and 0.91-0.93 (M4), depending on the prediction intervals. Similar AUCs were found in the test set. Additionally, the AUCs in the validation set were 0.85-0.95 (M1), 0.85-0.95 (M2), 0.91-0.94 (M3), and 0.92-0.93 (M4). Across all intervals, M1 consistently demonstrated the highest PPV, followed by M2, M3, and M4. For each model, PPVs were lowest for 1-year predictions but comparable at 3 and 5 years.

Interpretation: The models demonstrated strong predictive performance. Evaluating PPVs over the number of invited women provides the potential for risk-stratified screening and clinical utility.

Funding: Vetenskapsrådet, FORTE, Karolinska Institutet, Horizon 2020, and Cancerfonden.

使用机器学习的高度宫颈病变或更严重病变的预测模型。
背景:本研究旨在通过开发和验证预测宫颈高级别病变或更严重(HCL)风险的机器学习模型来提高宫颈筛查效率。方法:从瑞典全国登记册中,我们纳入了2016年邀请进行宫颈筛查的474,072名妇女(分为80%的培训组和20%的测试组)和2017年邀请进行验证的370,105名妇女。所有女性在29岁后的推荐间隔内接受了细胞学指标和/或人乳头瘤病毒(HPV)检测。预测因素包括筛查结果(细胞学和/或HPV检测)、其他HPV相关因素和人口统计学因素(包括年龄)。通过不同预测因子的5倍交叉验证训练了4个随机森林模型:模型1 (M1)(所有预测因子)、M2(细胞学、HPV检测、年龄)、M3 (HPV检测、其他HPV相关因素和人口统计学因素)和M4 (HPV检测和年龄)。我们计算了曲线下面积(auc),并绘制了通过女性干预数量来描绘阳性预测值(PPV)的图。结果:在训练集和测试集中,1年、3年和5年HCL发生率分别为0.25%、0.68%和1.05%。交叉验证auc分别为0.83 ~ 0.96 (M1)、0.83 ~ 0.96 (M2)、0.91 ~ 0.94 (M3)和0.91 ~ 0.93 (M4)。在测试集中也发现了类似的auc。验证集的auc分别为0.85 ~ 0.95 (M1)、0.85 ~ 0.95 (M2)、0.91 ~ 0.94 (M3)和0.92 ~ 0.93 (M4)。在所有区间内,M1始终表现出最高的PPV,其次是M2、M3和M4。对于每种模型,1年预测的ppv最低,但3年和5年预测的ppv可比较。解释:模型表现出很强的预测性能。评估ppv对受邀妇女人数的影响,为风险分层筛查和临床应用提供了可能。资助:vetenskapsr det、FORTE、卡罗林斯卡研究所、Horizon 2020和Cancerfonden。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
EClinicalMedicine
EClinicalMedicine Medicine-Medicine (all)
CiteScore
18.90
自引率
1.30%
发文量
506
审稿时长
22 days
期刊介绍: eClinicalMedicine is a gold open-access clinical journal designed to support frontline health professionals in addressing the complex and rapid health transitions affecting societies globally. The journal aims to assist practitioners in overcoming healthcare challenges across diverse communities, spanning diagnosis, treatment, prevention, and health promotion. Integrating disciplines from various specialties and life stages, it seeks to enhance health systems as fundamental institutions within societies. With a forward-thinking approach, eClinicalMedicine aims to redefine the future of healthcare.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书