Concordance-based Predictive Uncertainty (CPU)-Index: Proof-of-concept with application towards improved specificity of lung cancers on low dose screening CT

IF 6.1 2区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence in Medicine Pub Date : 2025-02-01 DOI:10.1016/j.artmed.2024.103055

Yuqi Wang , Aarzu Gupta , Fakrul Islam Tushar , Breylon Riley , Avivah Wang , Tina D. Tailor , Stacy Tantum , Jian-Guo Liu , Mustafa R. Bashir , Joseph Y. Lo , Kyle J. Lafata

{"title":"Concordance-based Predictive Uncertainty (CPU)-Index: Proof-of-concept with application towards improved specificity of lung cancers on low dose screening CT","authors":"Yuqi Wang , Aarzu Gupta , Fakrul Islam Tushar , Breylon Riley , Avivah Wang , Tina D. Tailor , Stacy Tantum , Jian-Guo Liu , Mustafa R. Bashir , Joseph Y. Lo , Kyle J. Lafata","doi":"10.1016/j.artmed.2024.103055","DOIUrl":null,"url":null,"abstract":"<div><div>In this paper, we introduce a novel concordance-based predictive uncertainty (CPU)-Index, which integrates insights from subgroup analysis and personalized AI time-to-event models. Through its application in refining lung cancer screening (LCS) predictions generated by an individualized AI time-to-event model trained with fused data of low dose CT (LDCT) radiomics with patient demographics, we demonstrate its effectiveness, resulting in improved risk assessment compared to the Lung CT Screening Reporting & Data System (Lung-RADS). Subgroup-based Lung-RADS faces challenges in representing individual variations and relies on a limited set of predefined characteristics, resulting in variable predictions. Conversely, personalized AI time-to-event models are hindered by transparency issues and biases from censored data. By measuring the prediction consistency between subgroup analysis and AI time-to-event models, the CPU-Index framework offers a nuanced evaluation of the bias–variance trade-off and improves the transparency and reliability of predictions. Consistency was estimated by the concordance index of subgroup analysis-based similarity rank and model prediction similarity rank. Subgroup analysis-based similarity loss was defined as the sum-of-the-difference between Lung-RADS and feature-level 0-1 loss. Model prediction similarity loss was defined as squared loss. To test our approach, we identified 3,326 patients who underwent LDCT for LCS from 1/1/2015 to 6/30/2020 with confirmation of lung cancer on pathology within one year. For each LDCT image, the lesion associated with a Lung-RADS score was detected using a pretrained deep learning model from Medical Open Network for AI (MONAI), from which radiomic features were extracted. Radiomics were optimally fused with patient demographics via a positional encoding scheme and used to train a neural multi-task logistic regression time-to-event model that predicts malignancy. Performance was maximized when radiomics features were fused with positionally encoded demographic features. In this configuration, our algorithm raised the AUC from 0.81 ± 0.04 to 0.89 ± 0.02. Compared to standard Lung-RADS, our approach reduced the False-Positive-Rate from 0.41 ± 0.02 to 0.30 ± 0.12 while maintaining the same False-Negative-Rate. Our methodology enhances lung cancer risk assessment by estimating prediction uncertainty and adjusting accordingly. Furthermore, the optimal integration of radiomics and patient demographics improved overall diagnostic performance, indicating their complementary nature.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"160 ","pages":"Article 103055"},"PeriodicalIF":6.1000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0933365724002975","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, we introduce a novel concordance-based predictive uncertainty (CPU)-Index, which integrates insights from subgroup analysis and personalized AI time-to-event models. Through its application in refining lung cancer screening (LCS) predictions generated by an individualized AI time-to-event model trained with fused data of low dose CT (LDCT) radiomics with patient demographics, we demonstrate its effectiveness, resulting in improved risk assessment compared to the Lung CT Screening Reporting & Data System (Lung-RADS). Subgroup-based Lung-RADS faces challenges in representing individual variations and relies on a limited set of predefined characteristics, resulting in variable predictions. Conversely, personalized AI time-to-event models are hindered by transparency issues and biases from censored data. By measuring the prediction consistency between subgroup analysis and AI time-to-event models, the CPU-Index framework offers a nuanced evaluation of the bias–variance trade-off and improves the transparency and reliability of predictions. Consistency was estimated by the concordance index of subgroup analysis-based similarity rank and model prediction similarity rank. Subgroup analysis-based similarity loss was defined as the sum-of-the-difference between Lung-RADS and feature-level 0-1 loss. Model prediction similarity loss was defined as squared loss. To test our approach, we identified 3,326 patients who underwent LDCT for LCS from 1/1/2015 to 6/30/2020 with confirmation of lung cancer on pathology within one year. For each LDCT image, the lesion associated with a Lung-RADS score was detected using a pretrained deep learning model from Medical Open Network for AI (MONAI), from which radiomic features were extracted. Radiomics were optimally fused with patient demographics via a positional encoding scheme and used to train a neural multi-task logistic regression time-to-event model that predicts malignancy. Performance was maximized when radiomics features were fused with positionally encoded demographic features. In this configuration, our algorithm raised the AUC from 0.81 ± 0.04 to 0.89 ± 0.02. Compared to standard Lung-RADS, our approach reduced the False-Positive-Rate from 0.41 ± 0.02 to 0.30 ± 0.12 while maintaining the same False-Negative-Rate. Our methodology enhances lung cancer risk assessment by estimating prediction uncertainty and adjusting accordingly. Furthermore, the optimal integration of radiomics and patient demographics improved overall diagnostic performance, indicating their complementary nature.

查看原文本刊更多论文

基于一致性的预测不确定性(CPU)-指数：用于提高肺癌低剂量筛查CT特异性的概念验证。

在本文中，我们引入了一种新的基于一致性的预测不确定性（CPU）指数，它集成了来自子组分析和个性化AI时间到事件模型的见解。通过将其应用于细化肺癌筛查（LCS）预测，该预测由低剂量CT （LDCT）放射组学与患者人口统计学融合数据训练的个性化AI时间到事件模型生成，我们证明了其有效性，与肺CT筛查报告和数据系统（lung - rads）相比，其风险评估得到了改善。基于亚组的Lung-RADS在表示个体变化方面面临挑战，并且依赖于一组有限的预定义特征，从而导致变量预测。相反，个性化的人工智能时间到事件模型受到透明度问题和审查数据偏见的阻碍。通过测量子组分析和AI时间到事件模型之间的预测一致性，CPU-Index框架提供了对偏差-方差权衡的细致评估，并提高了预测的透明度和可靠性。通过基于子群分析的相似度排序和模型预测相似度排序的一致性指数来估计一致性。基于亚组分析的相似性损失被定义为Lung-RADS与特征级0-1损失的差值之和。模型预测相似度损失定义为平方损失。为了验证我们的方法，我们确定了3326例在2015年1月1日至2020年6月30日期间接受LDCT检查的LCS患者，这些患者在一年内病理证实为肺癌。对于每张LDCT图像，使用来自AI医学开放网络（MONAI）的预训练深度学习模型检测与肺部rads评分相关的病变，并从中提取放射学特征。放射组学通过位置编码方案与患者人口统计学最佳融合，并用于训练预测恶性肿瘤的神经多任务逻辑回归时间-事件模型。当放射组学特征与位置编码的人口特征融合时，性能最大化。在这个配置中，我们的算法将AUC从0.81±0.04提高到0.89±0.02。与标准Lung-RADS相比，我们的方法将假阳性率从0.41±0.02降低到0.30±0.12，同时保持相同的假阴性率。我们的方法通过估计预测不确定性并进行相应调整来提高肺癌风险评估。此外，放射组学和患者人口统计学的最佳整合提高了整体诊断性能，表明它们的互补性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Intelligence in Medicine 工程技术-工程：生物医学

CiteScore

15.00

自引率

2.70%

发文量

143

审稿时长

6.3 months

期刊介绍： Artificial Intelligence in Medicine publishes original articles from a wide variety of interdisciplinary perspectives concerning the theory and practice of artificial intelligence (AI) in medicine, medically-oriented human biology, and health care. Artificial intelligence in medicine may be characterized as the scientific discipline pertaining to research studies, projects, and applications that aim at supporting decision-based medical tasks through knowledge- and/or data-intensive computer-based solutions that ultimately support and improve the performance of a human care provider.