Self-Explainable Graph Neural Network for Alzheimer Disease and Related Dementias Risk Prediction: Algorithm Development and Validation Study.

IF 5 Q1 GERIATRICS & GERONTOLOGY

JMIR Aging Pub Date : 2024-07-08 DOI:10.2196/54748

Xinyue Hu, Zenan Sun, Yi Nian, Yichen Wang, Yifang Dang, Fang Li, Jingna Feng, Evan Yu, Cui Tao

{"title":"Self-Explainable Graph Neural Network for Alzheimer Disease and Related Dementias Risk Prediction: Algorithm Development and Validation Study.","authors":"Xinyue Hu, Zenan Sun, Yi Nian, Yichen Wang, Yifang Dang, Fang Li, Jingna Feng, Evan Yu, Cui Tao","doi":"10.2196/54748","DOIUrl":null,"url":null,"abstract":"Background: Alzheimer disease and related dementias (ADRD) rank as the sixth leading cause of death in the United States, underlining the importance of accurate ADRD risk prediction. While recent advancements in ADRD risk prediction have primarily relied on imaging analysis, not all patients undergo medical imaging before an ADRD diagnosis. Merging machine learning with claims data can reveal additional risk factors and uncover interconnections among diverse medical codes.Objective: The study aims to use graph neural networks (GNNs) with claim data for ADRD risk prediction. Addressing the lack of human-interpretable reasons behind these predictions, we introduce an innovative, self-explainable method to evaluate relationship importance and its influence on ADRD risk prediction.Methods: We used a variationally regularized encoder-decoder GNN (variational GNN [VGNN]) integrated with our proposed relation importance method for estimating ADRD likelihood. This self-explainable method can provide a feature-important explanation in the context of ADRD risk prediction, leveraging relational information within a graph. Three scenarios with 1-year, 2-year, and 3-year prediction windows were created to assess the model's efficiency, respectively. Random forest (RF) and light gradient boost machine (LGBM) were used as baselines. By using this method, we further clarify the key relationships for ADRD risk prediction.Results: In scenario 1, the VGNN model showed area under the receiver operating characteristic (AUROC) scores of 0.7272 and 0.7480 for the small subset and the matched cohort data set. It outperforms RF and LGBM by 10.6% and 9.1%, respectively, on average. In scenario 2, it achieved AUROC scores of 0.7125 and 0.7281, surpassing the other models by 10.5% and 8.9%, respectively. Similarly, in scenario 3, AUROC scores of 0.7001 and 0.7187 were obtained, exceeding 10.1% and 8.5% than the baseline models, respectively. These results clearly demonstrate the significant superiority of the graph-based approach over the tree-based models (RF and LGBM) in predicting ADRD. Furthermore, the integration of the VGNN model and our relation importance interpretation could provide valuable insight into paired factors that may contribute to or delay ADRD progression.Conclusions: Using our innovative self-explainable method with claims data enhances ADRD risk prediction and provides insights into the impact of interconnected medical code relationships. This methodology not only enables ADRD risk modeling but also shows potential for other image analysis predictions using claims data.","PeriodicalId":36245,"journal":{"name":"JMIR Aging","volume":"7 ","pages":"e54748"},"PeriodicalIF":5.0000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11263893/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Aging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/54748","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GERIATRICS & GERONTOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Alzheimer disease and related dementias (ADRD) rank as the sixth leading cause of death in the United States, underlining the importance of accurate ADRD risk prediction. While recent advancements in ADRD risk prediction have primarily relied on imaging analysis, not all patients undergo medical imaging before an ADRD diagnosis. Merging machine learning with claims data can reveal additional risk factors and uncover interconnections among diverse medical codes.

Objective: The study aims to use graph neural networks (GNNs) with claim data for ADRD risk prediction. Addressing the lack of human-interpretable reasons behind these predictions, we introduce an innovative, self-explainable method to evaluate relationship importance and its influence on ADRD risk prediction.

Methods: We used a variationally regularized encoder-decoder GNN (variational GNN [VGNN]) integrated with our proposed relation importance method for estimating ADRD likelihood. This self-explainable method can provide a feature-important explanation in the context of ADRD risk prediction, leveraging relational information within a graph. Three scenarios with 1-year, 2-year, and 3-year prediction windows were created to assess the model's efficiency, respectively. Random forest (RF) and light gradient boost machine (LGBM) were used as baselines. By using this method, we further clarify the key relationships for ADRD risk prediction.

Results: In scenario 1, the VGNN model showed area under the receiver operating characteristic (AUROC) scores of 0.7272 and 0.7480 for the small subset and the matched cohort data set. It outperforms RF and LGBM by 10.6% and 9.1%, respectively, on average. In scenario 2, it achieved AUROC scores of 0.7125 and 0.7281, surpassing the other models by 10.5% and 8.9%, respectively. Similarly, in scenario 3, AUROC scores of 0.7001 and 0.7187 were obtained, exceeding 10.1% and 8.5% than the baseline models, respectively. These results clearly demonstrate the significant superiority of the graph-based approach over the tree-based models (RF and LGBM) in predicting ADRD. Furthermore, the integration of the VGNN model and our relation importance interpretation could provide valuable insight into paired factors that may contribute to or delay ADRD progression.

Conclusions: Using our innovative self-explainable method with claims data enhances ADRD risk prediction and provides insights into the impact of interconnected medical code relationships. This methodology not only enables ADRD risk modeling but also shows potential for other image analysis predictions using claims data.

查看原文本刊更多论文

用于阿尔茨海默病及相关痴呆症风险预测的自解释图神经网络：算法开发与验证研究。

背景：阿尔茨海默病和相关痴呆症（ADRD）是美国第六大死因，这凸显了准确预测 ADRD 风险的重要性。虽然最近在 ADRD 风险预测方面取得的进展主要依赖于影像分析，但并非所有患者在 ADRD 诊断前都接受了医学影像检查。将机器学习与理赔数据相结合，可以揭示更多的风险因素，并发现不同医疗代码之间的相互联系：本研究旨在利用图神经网络（GNN）和理赔数据进行 ADRD 风险预测。为了解决这些预测背后缺乏人类可解释原因的问题，我们引入了一种创新的、可自我解释的方法来评估关系的重要性及其对 ADRD 风险预测的影响：我们使用变异正则化编码器-解码器 GNN（变异 GNN [VGNN]）与我们提出的关系重要性方法相结合来估计 ADRD 可能性。这种可自我解释的方法可以在 ADRD 风险预测的背景下，利用图中的关系信息提供重要特征的解释。为评估模型的效率，分别创建了 1 年、2 年和 3 年预测窗口的三种情景。随机森林（RF）和轻梯度提升机（LGBM）被用作基线。通过这种方法，我们进一步明确了 ADRD 风险预测的关键关系：在方案 1 中，VGNN 模型在小子集和匹配队列数据集的接收者操作特征下面积（AUROC）得分分别为 0.7272 和 0.7480。平均而言，它分别比 RF 和 LGBM 高出 10.6% 和 9.1%。在情景 2 中，它的 AUROC 得分为 0.7125 和 0.7281，分别比其他模型高出 10.5% 和 8.9%。同样，在方案 3 中，AUROC 得分为 0.7001 和 0.7187，分别比基准模型高出 10.1%和 8.5%。这些结果清楚地表明，在预测 ADRD 方面，基于图的方法明显优于基于树的模型（RF 和 LGBM）。此外，将 VGNN 模型与我们的关系重要性解释相结合，可以为了解可能导致或延缓 ADRD 进展的配对因素提供有价值的见解：将我们的创新性自解释方法与理赔数据结合使用，可提高 ADRD 风险预测能力，并深入了解相互关联的医疗代码关系的影响。这种方法不仅能建立 ADRD 风险模型，还能利用理赔数据进行其他图像分析预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊