Prediction models for high-grade cervical lesions or worse using machine learning.

IF 10 1区医学 Q1 MEDICINE, GENERAL & INTERNAL

EClinicalMedicine Pub Date : 2026-03-05 eCollection Date: 2026-03-01 DOI:10.1016/j.eclinm.2026.103819

Yunyang Deng, Joakim Dillner, Nicholas Baltzer, Laila Sara Arroyo Mühr, Roxana Merino Martinez, Alexander Ploner, Jiayao Lei, Mark Clements

{"title":"Prediction models for high-grade cervical lesions or worse using machine learning.","authors":"Yunyang Deng, Joakim Dillner, Nicholas Baltzer, Laila Sara Arroyo Mühr, Roxana Merino Martinez, Alexander Ploner, Jiayao Lei, Mark Clements","doi":"10.1016/j.eclinm.2026.103819","DOIUrl":null,"url":null,"abstract":"Background: This study aimed to improve cervical screening efficiency by developing and validating machine-learning models for predicting high-grade cervical lesions or worse (HCL) risk.Methods: From Swedish nationwide registers, we included 474,072 women invited to cervical screening in 2016 (split into 80% training and 20% test sets) and 370,105 women invited in 2017 for validation. All women underwent index cytology and/or human papillomavirus (HPV) testing within the recommended interval after age 29. Predictors included screening results (cytology and/or HPV testing), other HPV-related factors, and demographic factors (including age). Four random forest models were trained via 5-fold cross-validation with different predictors: Model 1 (M1) (all predictors), M2 (cytology, HPV testing, age), M3 (HPV testing, other HPV-related factors, and demographic factors), and M4 (HPV testing and age). We computed area under the curves (AUCs) and created plots to depict positive predictive value (PPV) by the number of women intervened.Findings: In training and test sets, 1-, 3-, and 5-year HCL incidence proportions were 0.25%, 0.68%, and 1.05%, respectively. Cross-validated AUCs were 0.83-0.96 (M1), 0.83-0.96 (M2), 0.91-0.94 (M3), and 0.91-0.93 (M4), depending on the prediction intervals. Similar AUCs were found in the test set. Additionally, the AUCs in the validation set were 0.85-0.95 (M1), 0.85-0.95 (M2), 0.91-0.94 (M3), and 0.92-0.93 (M4). Across all intervals, M1 consistently demonstrated the highest PPV, followed by M2, M3, and M4. For each model, PPVs were lowest for 1-year predictions but comparable at 3 and 5 years.Interpretation: The models demonstrated strong predictive performance. Evaluating PPVs over the number of invited women provides the potential for risk-stratified screening and clinical utility.Funding: Vetenskapsrådet, FORTE, Karolinska Institutet, Horizon 2020, and Cancerfonden.","PeriodicalId":11393,"journal":{"name":"EClinicalMedicine","volume":"93 ","pages":"103819"},"PeriodicalIF":10.0000,"publicationDate":"2026-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12972733/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EClinicalMedicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.eclinm.2026.103819","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/3/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background: This study aimed to improve cervical screening efficiency by developing and validating machine-learning models for predicting high-grade cervical lesions or worse (HCL) risk.

Methods: From Swedish nationwide registers, we included 474,072 women invited to cervical screening in 2016 (split into 80% training and 20% test sets) and 370,105 women invited in 2017 for validation. All women underwent index cytology and/or human papillomavirus (HPV) testing within the recommended interval after age 29. Predictors included screening results (cytology and/or HPV testing), other HPV-related factors, and demographic factors (including age). Four random forest models were trained via 5-fold cross-validation with different predictors: Model 1 (M1) (all predictors), M2 (cytology, HPV testing, age), M3 (HPV testing, other HPV-related factors, and demographic factors), and M4 (HPV testing and age). We computed area under the curves (AUCs) and created plots to depict positive predictive value (PPV) by the number of women intervened.

Findings: In training and test sets, 1-, 3-, and 5-year HCL incidence proportions were 0.25%, 0.68%, and 1.05%, respectively. Cross-validated AUCs were 0.83-0.96 (M1), 0.83-0.96 (M2), 0.91-0.94 (M3), and 0.91-0.93 (M4), depending on the prediction intervals. Similar AUCs were found in the test set. Additionally, the AUCs in the validation set were 0.85-0.95 (M1), 0.85-0.95 (M2), 0.91-0.94 (M3), and 0.92-0.93 (M4). Across all intervals, M1 consistently demonstrated the highest PPV, followed by M2, M3, and M4. For each model, PPVs were lowest for 1-year predictions but comparable at 3 and 5 years.

Interpretation: The models demonstrated strong predictive performance. Evaluating PPVs over the number of invited women provides the potential for risk-stratified screening and clinical utility.

Funding: Vetenskapsrådet, FORTE, Karolinska Institutet, Horizon 2020, and Cancerfonden.

查看原文本刊更多论文

使用机器学习的高度宫颈病变或更严重病变的预测模型。

背景：本研究旨在通过开发和验证预测宫颈高级别病变或更严重（HCL）风险的机器学习模型来提高宫颈筛查效率。方法：从瑞典全国登记册中，我们纳入了2016年邀请进行宫颈筛查的474,072名妇女（分为80%的培训组和20%的测试组）和2017年邀请进行验证的370,105名妇女。所有女性在29岁后的推荐间隔内接受了细胞学指标和/或人乳头瘤病毒（HPV）检测。预测因素包括筛查结果（细胞学和/或HPV检测）、其他HPV相关因素和人口统计学因素（包括年龄）。通过不同预测因子的5倍交叉验证训练了4个随机森林模型：模型1 (M1)（所有预测因子）、M2（细胞学、HPV检测、年龄）、M3 （HPV检测、其他HPV相关因素和人口统计学因素）和M4 （HPV检测和年龄）。我们计算了曲线下面积（auc），并绘制了通过女性干预数量来描绘阳性预测值（PPV）的图。结果：在训练集和测试集中，1年、3年和5年HCL发生率分别为0.25%、0.68%和1.05%。交叉验证auc分别为0.83 ~ 0.96 （M1）、0.83 ~ 0.96 （M2）、0.91 ~ 0.94 （M3）和0.91 ~ 0.93 （M4）。在测试集中也发现了类似的auc。验证集的auc分别为0.85 ~ 0.95 （M1）、0.85 ~ 0.95 （M2）、0.91 ~ 0.94 （M3）和0.92 ~ 0.93 （M4）。在所有区间内，M1始终表现出最高的PPV，其次是M2、M3和M4。对于每种模型，1年预测的ppv最低，但3年和5年预测的ppv可比较。解释：模型表现出很强的预测性能。评估ppv对受邀妇女人数的影响，为风险分层筛查和临床应用提供了可能。资助：vetenskapsr det、FORTE、卡罗林斯卡研究所、Horizon 2020和Cancerfonden。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

EClinicalMedicine Medicine-Medicine (all)

CiteScore

18.90

自引率

1.30%

发文量

506

审稿时长

22 days

期刊介绍： eClinicalMedicine is a gold open-access clinical journal designed to support frontline health professionals in addressing the complex and rapid health transitions affecting societies globally. The journal aims to assist practitioners in overcoming healthcare challenges across diverse communities, spanning diagnosis, treatment, prevention, and health promotion. Integrating disciplines from various specialties and life stages, it seeks to enhance health systems as fundamental institutions within societies. With a forward-thinking approach, eClinicalMedicine aims to redefine the future of healthcare.